LDL-receptor

ABSTRACT

A novel receptor, “LDL-receptor related protein-3” (“LRP-3”), is provided, along with encoding nucleic acid. The gene is associated with type 1 diabetes (insulin dependent diabetes mellitus), and experimental evidence provides indication that it is the IDDM susceptibility gene IDDM4. In various aspects the invention provides nucleic acid, including coding sequences, oligonucleotide primers and probes, polypeptides, pharmaceutical compositions, methods of diagnosis or prognosis, and other methods relating to and based on the gene, including methods of treatment of diseases in which the gene may be implicated, including autoimmune diseases, such as glomerulonephritis, diseases and disorders involving disruption of endocytosis and/or antigen presentation, diseases and disorders involving cytokine clearance and/or inflammation, viral infection, elevation of free fatty acids or hypercholesterolemia, osteoporosis, Alzheimer&#39;s disease, and diabetes.

This application claims benefit of PCT/GB98/01102, filed Apr. 15, 1998,and U.S. Provisional Application Nos. 60/043,553 and 60/048,740, filedApr. 15, 1997 and Jun. 5, 1997, respectively.

FIELD OF THE INVENTION

The present invention relates to nucleic acids, polypeptides,oligonucleotide probes and primers, methods of diagnosis or prognosis,and other methods relating to and based on the identification of a gene,which is characterised as a member of the LDL-receptor family and forwhich there are indications that some alleles are associated withsusceptibility to insulin-dependent diabetes mellitus (“IDDM”), alsoknown as type 1 diabetes.

More particularly, the present invention is based on cloning andcharacterisation of a gene which the present inventors have termed“LDL-receptor related protein-5 (LRP5)” (previously “LRP-3”), based oncharacteristics of the encoded polypeptide which are revealed herein forthe first time and which identify it as a member of the LDL receptorfamily. Furthermore, experimental evidence is included herein whichprovides indication that LRP5 is the IDDM susceptibility gene IDDM4.

BACKGROUND OF THE INVENTION

Diabetes, the dysregulation of glucose homeostasis, affects about 6% ofthe general population. The most serious form, type 1 diabetes, whichaffects up to 0.4% of European-derived population, is caused byautoimmune destruction of the insulin producing β-cells of the pancreas,with a peak age of onset of 12 years. The β-cell destruction isirreversible, and despite insulin replacement by injection patientssuffer early mortality, kidney failure and blindness (Bach, 1994; Tischand McDevitt, 1996). The major aim, therefore, of genetic research is toidentify the genes predisposing to type 1 diabetes and to use thisinformation to understand disease mechanisms and to predict and preventthe total destruction of β-cells and the disease.

The mode of inheritance of type 1 diabetes does not follow a simpleMendelian pattern, and the concordance of susceptibility genotype andthe occurrence of disease is much less than 100%, as evidenced by the30-70% concordance of identical twins (Matsuda and Kuzuya, 1994; Kyviket al, 1995). Diabetes is caused by a number of genes or polygenesacting together in concert, which makes it particularly difficult toidentify and isolate individual genes.

The main IDDM locus is encoded by the major histo-compatibility complex(MHC) on chromosome 6p21 (IDDM1). The degree of familial clustering atthis locus, λs=2.5, where λs=P expected [sharing of zero alleles at thelocus identical-by-descent (IBD)]/P observed [sharing of zero allelesIBD] (Risch 1987; Todd, 1994), with a second locus on chromosome 11p15,IDDM2, the insulin minisatellite λs=1.25 (Bell et al, 1984; Thomson etal, 1989; Owerbach et al, 1990; Julier et al, 1991; Bain et al, 1992;Spielman et al, 1993; Davies et al, 1994; Bennett et al, 1995). Theseloci were initially detected by small case control association studies,based on their status as functional candidates, which were laterconfirmed by further case-control, association and linkage studies.

These two loci, however, cannot account for all the observed clusteringof disease in families (λs=15), which is estimated from the ratio of therisk for siblings of patients and the population prevalence (6%/0.4%)(Risch, 1990). We initiated a positional cloning strategy in the hope ofidentifying the other loci causing susceptibility to type 1 diabetes,utilising the fact that markers linked to a disease gene will showexcess of alleles shared identical-by-descent in affected sibpairs(Penrose, 1953; Risch, 1990; Holmans, 1993).

The initial genome-wide scan for linkage utilising 289 microsatellitemarkers, in 96 UK sibpair families, revealed evidence of linkage to anadditional eighteen loci (Davies et al, 1994). Confirmation of linkageto two of these loci was achieved by analysis of two additional familysets (102 UK families and 84 USA families), IDDM4 on chromosome 11q13(MLS 1.3, P=0.003 at FGF3) and IDDM5 on chromosome 6q (MLS 1.8 at ESR).At IDDM4 the most significant linkage was obtained in the subset offamilies sharing 1 or 0 alleles IBD at HLA (MLS=2.8; P=0.001; λs=1.2)(Davies et al, 1994). This linkage was also observed by Hashimoto et al(1994) using 251 affected sibpairs, obtaining P=0.0008 in all sibpairs.Combining these results, with 596 families, provides substantial supportfor IDDM4 (P=1.5×10−6) (Todd and Farrall, 1996; Luo et al, 1996).

BRIEF DESCRIPTION OF THE INVENTION

The present inventors now disclose for the first time a gene encoding anovel member of the LDL-receptor family, which they term “LRP5”(previously “LRP-3”). Furthermore, evidence indicates that the generepresents the IDDM susceptibility locus IDDM4, the identification andisolation of which is a major scientific breakthrough.

Over the last 10 years many genes for single gene or monogenic diseases,which are relatively rare in the population, have been positioned bylinkage analysis in families, and localised to a small enough region toallow identification of the gene. The latter sublocalisation and finemapping can be carried out in single gene rare diseases becauserecombinations within families define the boundaries of the minimalinterval beyond any doubt. In contrast, in common diseases such asdiabetes or asthma the presence of the disease mutation does not alwayscoincide with the development of the disease: disease susceptibilitymutations in common disorders provide risk of developing of the disease,and this risk is usually much less than 100%. Hence, susceptibilitygenes in common diseases cannot be localised using recombination eventswithin families, unless tens of thousands of families are available tofine map the locus. Because collections of this size are impractical,investigators are contemplating the use of association mapping, whichrelies on historical recombination events during the history of thepopulation from which the families came from.

Association mapping has been used in over a dozen examples of raresingle gene traits, and particularly in genetically isolated populationssuch as Finland to fine map disease mutations. Nevertheless, associationmapping is fundamentally different from straightforward linkage mappingbecause even though the degree of association between two markers or amarker and a disease mutation is proportional to the physical distancealong the chromosome this relationship can be unpredictable because itis dependent on the allele frequencies of the markers, the history ofthe population and the age and number of mutations at the disease locus.For rare, highly penetrant single gene diseases there is usually onemajor founder chromosome in the population under study, making itrelatively feasible to locate an interval that is smaller than one thatcan be defined by standard recombination events within living families.The resolution of this method in monogenic diseases in which there isone main founder chromosome is certainly less than 2 cM, and in certainexamples the resolution is down to 100 kb of DNA (Hastbacka et al.(1994) Cell 78,1-20).

In common diseases like type 1 diabetes, which are caused by a number ofgenes or polygenes acting together in concert the population frequencyof the disease allele may be very high, perhaps exceeding 50%, and thereare likely to be several founder chromosomes, all of which impart risk,and not a 100% certainty of disease development. Because associationmapping is dependent on unpredictable parameters, and because founderchromosomes will be several and common in frequency in the generalpopulation, the task of fine mapping polygenes is currently one of somecontroversy, and many doubt the feasibility at all of a systematicgenetic approach using a combination of linkage and association mapping.Recently, Risch and Marakandis have provided some mathematicalbackground to the feasibility of association mapping in complex diseases(Science 273 1516-1517, 1996) but they did not take into account theeffect of multiple founder chromosomes.

As a result of these uncertainties, extremely large numbers of diabeticfamilies are required for genotyping, with a large number of markersacross a specific region, giving a linkage disequilibrium curve whichmay have several peaks. The question is, which peak identifies theaetiological mutation, and in what ways can we establish this? To ourknowledge, the linkage disequilibrium curves and haplotype associationmaps shown in FIGS. 3, 4, 19 and 20 are the first of their kind for anycomplex polygenic disease for any locus. Curves of this nature have notbeen published yet in the literature, even for the well-establishedIDDM1/MHC locus. In this respect the work described here is entirelynovel and at the cutting edge of research into the genetics ofpolygenes.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates approximate localisation of IDDM4 on chromosome11q13. Multipoint linkage map of maximum likelihood IBD in a subgroup ofHLA 1:0 sharers in 150 families. MLS of 2.3 at FGF3 and D11S1883(λs=1.19) were obtained (Davies et al (1994) Nature 371: 130-136).

FIG. 2 shows a physical map of the region D11S987-Galanin on chromosome11q13. The interval was cloned in pacs, bacs and cosmids, andrestriction mapped using a range of restriction enzymes to determine thephysical distance between each marker.

FIG. 3 shows a single-point linkage disequilibrium curve at the IDDM4region. 1289 families were analysed by TDT, with a peak at H0570POLYA,)P=0.001. x-axis: physical distance in kb; y-axis: TDT χ² statistic(tdf).

FIG. 4 shows a three-point rolling linkage disequilibrium curve atIDDM4, with 1289 families, from four different populations (UK, USA,Sardinia and Norway). In order to minimise the effects of variation inallele frequency at each polymorphism, the TDT data was obtained atthree consecutive markers, and expressed as an average of the three.x-axis: physical distance in kb; y-axis: TDT χ² statistic.

FIG. 5(a) shows DNA sequence of the LRP5 isoform 1 cDNA (SEQ ID NO:1).

FIG. 5(b) shows the DNA sequence of the longest open reading framepresent in the LRP5 cDNA (SEQ ID NO:2).

FIG. 5(c) amino acid sequence translation (in standard single lettercode) of the open reading frame in FIG. 5(b) (SEQ ID NO:3).

FIG. 5(d) motifs of LRP5 isoform 1 (SEQ ID NO:3), encoded by the openreading frame contained in FIG. 5(b) (SEQ ID NO:2). Symbols: Underlinedresidues 1-24 contain a signal for protein export and cleavage, □indicates the position of an intron/exon boundary, * indicates aputative N-linked glycosylation site in the proposed extracellularportion of the receptor. The EGF-binding motifs are shaded light gray,LDL-receptor ligand motifs are shaded a darker gray. The spacer regionsare indicated by the underlined four amino acids with high similarity tothe YWTD motif. A putative transmembrane spanning domain is underlinedwith a heavy line. Areas shaded in the cytoplasmic domain (1409 to end)may be involved in endocytosis.

FIG. 5(e) amino acid sequence of the mature LRP5 protein (SEQ ID NO:4).

FIG. 5(f) shows the comparison of the nucleotide sequence of the first432 nucleotides of the 5′ end of the human isoform1 cDNA sequence (FIG.5(a) (SEQ ID NO:1)) on the upper line (SEQ ID NO:5) with the first 493nucleotides of the 5′ end of the mouse Lrp5 cDNA sequence (FIG. 16(a)(SEQ ID NO:35)) on the lower line (SEQ ID NO:6). The comparison wasperformed using the GCG algorithm GAP (Genetics Computer Group, Madison,Wis.).

FIG. 5(g) shows the comparison of the first 550 amino acids of humanLRP5 isoform 1 (SEQ ID NO:7) with the first 533 amino acids of mouseLrp5 (SEQ ID NO:8) using the GCG algorithm GAP (Genetics Computer Group,Madison, Wis.).

FIG. 6(a) shows the amino acid sequence of LRP5 motifs (SEQ ID NOS:9 to22). A comparison was made using the program crossmatch (obtained fromDr. Phil Green, University of Washington) between the motifs present inLRP1 and the LRP5 amino acid sequence. The best match for each LRP5motif is shown. For each motif, the top line is the LRP5 isoform 1 aminoacid sequence, the middle line is amino acids that are identical in thetwo motifs, the lower line is the amino acid sequence of the best matchLRP1 motif. Of particular note are the conserved cysteine (C) residuesthat are the hallmark of both the EGF-precursor and LDL-receptor ligandbinding motifs(SEQ ID NOS:9-22).

FIG. 6(b) illustrates the motif organization of the LDL-receptor andLRP5. The LDL-receptor ligand binding motif are represented by the lightgray boxes, the EGFlike motifs are represented by the dark gray boxes.The YWTD spacer motifs are indicated by the vertical lines. The putativetransmembrane domains are represented by the black box.

FIG. 7 shows LRP5 gene structure. The DNA sequence of contiguous piecesof genomic DNA is represented by the heavy lines and are according tothe indicated scale. The position of the markers D11S1917(UT5620),H0570POLYA, L3001CA, D11S1337, and D11S970 are indicated. The exons areindicated by the small black boxes with their numerical or alphabeticalname below, the size of the exons is not to scale.

FIG. 8 illustrates different LRP5 gene isoforms. Alternatively spliced5′ ends of the LRP5 gene are indicated with the isoform number for eachalternatively spliced form. The light gray arrow indicates the start oftranslation which occurs in exon 6 in isoform 1, may occur upstream ofexon 1 in isoform 3 and occurs in exon B in isoforms 2, 4, 5. and 6. Thecore 22 exons (A to V) are represented by the box.

FIG. 9 is a SNP map of Contig 57. Polymorphisms were identified by thecomparison of the DNA sequence of BAC 14-1-15 with cosmids EO 864 and BO7185. Corresponding Table 6 indicates a PCR amplicon that includes thesite of the polymorphism, the nature of the single nucleotidepolymorphis (SNP), its location and the restriction site that isaltered, if any. The line represents the contiguous genomic DNA with therelative location of the polymorphisms and the amplicons used to detectthem. The large thin triangles represent the site of putative exons. Themarker H0570POLYA is indicated.

FIG. 10 is a SNP map of Contig 58. Polymorphisms were identified by thecomparison of the DNA sequence of BAC 14-1-15 with cosmid BO 7185.Corresponding Table 6 indicates a PCR amplicon that includes the site ofthe polymorphism, the nature of the single nucleotide polymorphism(SNP), its location and the restriction site that is altered, if any.The line represents the contiguous genomic DNA with the relativelocation of the polymorphisms and the amplicons used to detect them. Thelarge thin triangle at the very end of the line represents exon A ofLRP5.

FIG. 11(a) shows the DNA sequence of the isoform 2 cDNA (SEQ ID NO:23).

FIG. 11(b) shows the longest open reading frame of isoform 2 (alsoisoform 4,5,6) (SEQ ID NO:24).

FIG. 11(c) shows the amino acid sequence of isoform 2 (also isoform4,5,6) (SEQ ID NO:25), encoded by the open reading frame of FIG. 12(b).

FIG. 12(a) shows the DNA sequence of isoform 3 cDNA (SEQ ID NO:26).

FIG. 12(b) shows sequence obtained by GRAIL and a putative extension ofisoform 3 (SEQ ID NO:27).

FIG. 12(c) shows a putative open reading frame for isoform 3 (SEQ IDNO:28).

FIG. 12(d) shows the amino acid sequence of isoform 3 (SEQ ID NO:29).

FIG. 12(e) shows the GRAIL predicted promoter sequence for isoform 3(SEQ ID NO:30).

FIG. 13 shows the DNA sequence of the isoform 4 cDNA (SEQ ID NO:31),which contains an open reading frame encoding isoform 2 (FIG. 11(b)).

FIG. 14 shows the DNA sequence of the present in cDNA isoform 5 (SEQ IDNO:32), which contains an open reading frame encoding isoform 2 (FIG.11(b)).

FIG. 15(a) shows the DNA sequence of isoform 6 (SEQ ID NO:33), whichcontains an open reading frame encoding isoform 2 (FIG. 11(b).

FIG. 15(b) shows the GRAIL predicted promoter sequence associated withisoform6 (SEQ ID NO:34).

FIG. 16(a) shows the DNA sequence of a portion of the mouse Lrp5 cDNA(SEQ ID NO:35).

FIG. 16(b) shows the DNA sequence of the 5′ extension of the mouse clone(SEQ ID NO:36).

FIG. 16(c) shows the DNA sequence of a portion of the open reading frameof mouse Lrp5 (SEQ ID NO:37).

FIG. 16(d) show the amino acid sequence of the open reading frameencoding a portion of mouse Lrp5 (SEQ ID NO:8).

FIG. 17(a)shows DNA sequence of exons A to V (SEQ ID NO:38).

FIG. 17(b) shows the amino acid sequence (SEQ ID NO:39) encoded by anopen reading frame contained in FIG. 17(a).

FIG. 18(a) shows the nucleotide sequence of the full length mouse Lrp5cDNA (SEQ ID NO:40).

FIG. 18(b) shows the nucleotide sequence for the longest open readingframe present in the mouse Lrp5 cDNA (SEQ ID NO:41).

FIG. 18(c) shows the amino acid sequence translation (in single lettercode) of the open reading frame in FIG. 18(b) (SEQ ID NO:42).

FIG. 18(d) shows an alignment of the amino acid sequence of the humanLRP5 protein and the mouse Lrp5 protein (SEQ ID NOS:3,42) program usingthe GCG algorithm GAP (Genetics Computer Group, Madison, Wis.).

FIG. 18(e) shows an alignment of the amino acid sequence of the maturehuman LRP5 protein with the mature mouse LRP5 (SEQ ID NOS:43,44) programusing the GCG algorithm GAP (Genetics Computer Group, Madison, Wis.).

FIG. 19 shows a schematic representation of haplotypes across the IDDM4region. Three distinct haplotypes are shown. Haplotype A is protectiveagainst IDDM whereas haplotypes B and C are susceptible/non-protectivefor IDDM.

FIG. 20 shows a schematic representation of single nucleotidepolymorphism (SNP) haplotypes across the IDDM4 region. Haplotype A isprotective whereas haplotypes B, C, D, and E aresusceptible/non-protective. A minimal region of 25 kb which is IdenticalBy Descent (IBD) for the four susceptible haplotypes is indicated. TheSNP designations, e.g. 57-3, are as described in Table 6 and FIGS. 9 and10.

LRP5 GENE STRUCTURE

The gene identified contains 22 exons, termed A-V, which encode most ofthe mature LRP5 protein. The 22 exons account for 4961 nucleotides ofthe LRP5 gene transcript (FIG. 5(a) (SEQ ID NO: 1) and are located in anapproximately 110 kb of genomic DNA. The genomic DNA containing theseexons begins downstream of the genetic marker L3001CA and includes thegenetic markers D11S1337, 14lca5, and D11S970 (FIG. 7). Severaldifferent 5′ ends of the LRP5 transcript have been identified. Ofparticular interest is isoform 1 with a 5′ end encoding a signal peptidesequence for protein export (secretory leader peptide) across the plasmamembrane. As discussed below the LRP5 protein is likely to contain alarge extracellular domain, therefore it would be anticipated that thisprotein would have a signal sequence. The exon encoding the signalsequence, termed exon 6, lies near the genetic marker H0570POLYA. Thisexon is 35 kb upstream of exon A and thus extends the genomic DNAcomprising the LRP5 gene to at least 160 kb.

Several additional isoforms of the LRP5 gene that arise from alternativesplicing of the 5′ end have been identified by PCR (FIG. 8). Thefunctional relevance of these additional isoforms is not clear. Two ofthese LRP5 transcripts contain exon 1 which is located upstream of thegenetic marker D11S1917(UT5620) and expands the LRP5 gene toapproximately 180 kb of genomic DNA. The transcript termed isoform 3consists of exon 1 spliced directly to exon A. The reading frame is openat the 5′ end and thus there is the potential for additional codinginformation present in exons upstream of exon 1. Alternatively,centromeric extension of exon 1 to include all of the open reading frameassociated with this region yields the open reading frame for isoform 3.

The second transcript that contains exon 1 also contains exon 5, whichis located near the genetic marker H0570POLYA. The open reading framefor this isoform, isoform 2, begins in exon B and thus encodes atruncated LRP5 protein which lacks any predicted secretory leaderpeptide in the first 100 amino acids. There are three additionaltranscripts each with an open reading frame beginning in exon B and with5′ ends near the genetic marker L3001CA.

Expression Profile of LRP5

Northern blot analysis indicates that the major mRNA transcript for theLRP5 gene is approximately 5 to 5.5 kb and is most highly expressed inliver, pancreas, prostate, and placenta. Expression is also detected inskeletal muscle, kidney, spleen, thymus, ovary, lung, small intestine,and colon. Minor bands both larger and smaller than 5 kb are detectedand may represent alternative splicing events or related family members.

LRP5 is a Member of the LDL-receptor Family

The gene identified in the IDDM4 locus, lrp5, is a member of theLDL-receptor family. This family of proteins has several distinguishingcharacteristics, a large extracellular domain containing cysteine richmotifs which are involved in ligand binding, a single transmembranespanning domain, and an “NPXY” (SEQ ID NO:45) internalizationmotif-(Krieger and Herz (1994) Ann. Rev. Biochem. 63: 601-637). Thefunctional role of the members of this family is the clearance of theirligands by the mechanism of receptor mediated endocytosis. This isillustrated by the most highly characterized member of the family, theLDL-receptor which is responsible for the clearance of LDL cholesterolfrom plasma (Goldstein, et. al. (1985) Ann. Rev. Cell Biol. 1: 1-39).

LRP5 is most closely related to the LDL-receptor related protein (LRP)which is also know as the alpha2-macroglobulin receptor. Translation ofthe open reading frame (ORF) of isoform 1 yields the LRP5 protein.Comparison of the LRP5 protein to human LRP1 using the algorithm GAP(Genetics Computer Group, Madison, Wis.) reveals an overall amino acidsimilarity of 55% and 34% identity to the region of the human LRP1protein from amino acids 1236 to 2934. The DNA of this ORF is 45%identical to LRP1 encoding DNA as indicated by GAP. A slightly lower butsignificant level of similarity is seen with the megalin receptor alsotermed LRP2 and gp330 (Saito, et al. (1994) Proc. Natl. Acad. Sci. 91:9725-9729), as well as the Drosophilla vitellogenin receptor (Schonboumet. al. (1995) Proc. Natl. Acad. Sci. 92: 1485-1489). Similarity is alsoobserved with other members of the LDL-receptor family including theLDL-receptor (Suedhof et. al. (1985) Science 228: 815-822) and the VLDLreceptor (Oka et. al. (1994) Genomics 20: 298-300). Due to the presenceof EGF-like motifs in LRP5 similarity is also observed with the EGFprecursor and nidogen precursor which are not members of theLDL-receptor family.

Properties and Motifs of LRP5

The N-terminal portion of LRP5 likely has the potential for a signalsequence cleavage site. Signal sequences are frequently found inproteins that are exported across the plasma membrane (von Heijne (1994)Ann. Rev. Biophys. Biomol. Struc. 23: 167-192). In addition, othermembers of the LDL-receptor family contain a signal sequence for proteinexport.

The presence of a signal sequence cleavage site was initially identifiedby a comparison of the human LRP5 with a mouse cDNA sequence that weobtained. The initial mouse partial cDNA sequence that we obtained, 1711nucleotides (FIG. 16(a) (SEQ ID NO:35)), is 87% identical over anapproximately 1500 nucleotide portion to the human LRP5 cDNA and thus islikely to be the mouse ortholog (Lrp5) of the human LRP5. The clonedportion of the mouse cDNA contains an open reading frame (FIG. 16(c)(SEQ ID NO:37)) encoding 533 amino acids. The initiating codon hasconsensus nucleotides for efficient translation at both the −3 (purine)and +4 (G nucleotide) positions (Kozak, M. 1996, Mamalian Genome7:563-574). A 500 amino acid of the portion of the mouse Lrp5 (FIG. 5(g)and FIG. 16(d) (SEQ ID NO:8)) is 96% identical to human LRP5, furthersupporting the proposal that this is the mouse ortholog of LRP5.

Significantly, the first 200 nucleotides of the mouse cDNA have verylittle similarity to the 5′ extensions present in isoforms 2-6 discussedbelow. By contrast this sequence is 75% identical with the humansequence for exon 6 that comprises the 5′ end of isoform 1. Thus isoform1 which encodes a signal peptide for protein export likely representsthe most biologically relevant form of LRP5.

Importantly, both the human LRP5 and mouse Lrp5 open reading framesencodes a peptide with the potential to act as a eukaryotic signalsequence for protein export (von Heijne, 1994, Ann. Rev. Biophys.Biomol. Struc. 23:167-192). The highest score for the signal sequence asdetermined by using the SigCleave program in the GCG analysis package(Genetics Computer Group, Madison Wis.) generates a mature peptidebeginning at residue 25 of human LRP5 and residue 29 of mouse Lrp5(FIGS. 5(d and g)). Additional sites that may be utilized produce maturepeptides in the human LRP5 beginning at amino acid residues 22, 23, 23,26, 27, 28, 30 or 32. Additional cleavage sites in the mouse Lrp5 resultin mature peptides beginning at amino acid residue 31, 32, 33, or 38(FIG. 5(g) (SEQ ID NO:8)). The mature human LRP5 protein is show in FIG.5(e) (SEQ ID NO:4).

The other alternative isoforms of LRP5 lack a signal sequence near theN-terminus of the encoded protein. The functional relevance of theseadditional isoforms is not known, however there are several exportedproteins which lack a signal sequence and are transported by a signalpeptide independent mechanism (Higgins, C. F. (1992) Ann. Rev. CellBiol. 8: 67-113). Thus it is possible that the putative extracellulardomain of these isoforms is translocated across the plasma membrane.

The extracellular domain of members of the LDL receptor family containsmultiple motifs containing six cysteine residues within an approximately40 amino acid region. (Krieger and Herz (1994) Ann. Rev. Biochem. 63:601-637). Several classes of these cysteine rich motifs have beendefined based on the spacing of the cysteine residues and the nature ofother conserved amino acids within the motif. The LDL-receptor ligandbinding (class A) motif is distinguished by a cluster of acidic residuesin the C-terminal portion of the motif which includes a highly conservedSDE sequence. The importance of this acidic region in ligand binding hasbeen demonstrated by mutagenesis studies (Russell et. al. (1989) J.Biol. Chem. 264: 21682-21688). Three LDL-receptor ligand binding motifsare found in the LRP5 protein (FIG. 6(a) (SEQ ID NOS:9 to 22)). TheEGF-like (class B) motif lacks the cluster of acidic residues present inthe LDL-receptor ligand binding motif. In addition, the spacing of thecysteine residues differs in the EGF-like motifs relative to theLDL-receptor ligand binding motif. The LRP5 protein contains 4EGF-precursor (B.2) motifs which have the property of an NGGCS motifbetween the first and second cysteine residue (FIG. 6(a) (SEQ ID NOS:9to 22)).

The size of the members of the LDL receptor family and the number of thecysteine-rich repeats in the extracellular domain varies greatly. LRP1is a large protein of 4544 amino acids and contains 31 LDL-receptorligand binding motifs (class A) and 22 EGF-like motifs (class B) (Herzet. al., (1988) EMBO 7: 4119-4127). Similarly the megalin receptor,LRP2, is a protein of 4660 amino acids and consists of 36 LDL-receptorligand binding motifs and 17 EGF-like motifs (Saito et. al. (1994) PNAS91: 9725-9729). In contrast, the LDL receptor is a relatively smallprotein of 879 amino acids which contains 7 LDL-ligand binding motifsand 3 EGF-like motifs. The predicted size of the mature LRP5 protein,1591 amino acids, is intermediate between LRP1 and the LDL receptor. Asindicated above the LRP5 protein contains four EGF-like motifs and threeLDL-ligand binding motifs. It has been postulated that the multiplemotif units, particularly evident in LRP1 and LRP2, account for theability of these proteins to bind multiple lipoprotein and proteinligands (Krieger and Herz (1994) Ann. Rev. Biochem. 63: 601-637).

The arrangement of the LDL-receptor ligand binding and EGF-like motifsrelative to each other is similar in both the LDL receptor, LRP1, andLRP2. In each of these proteins multiple LDL-ligand binding motifs aregrouped together and followed by at least one EGF-like motif (Herz et.al., (1988) EMBO 7: 4119-4127, 1988). By contrast, in the LRP5 proteinan EGF-like motif precedes the group of three LDL-ligand binding motifs(FIG. 6(b)). An additional property unique to LRP5 is that theLDL-ligand binding motifs in LRP5 are followed by the putativetransmembrane domain. The different arrangement of the motifs may defineLRP5 as a member of a new subfamily within the LDL-receptor relatedprotein family.

LRP5 has a signal peptide for protein export at the N-terminus of theprotein. Signal peptide cleavage yields a mature LRP5 protein whichbegins with an EGF precursor spacer domain from amino acids 31-297(amino acid residue numbers are based upon the LRP5 precursor). The EGFprecursor spacer domain is composed of five approximately 50 amino acidrepeats that each contain the characteristic sequence motifTyr-Trp-Thr-Asp (YWTD) (SEQ ID NO:46). There are three additional spacerdomains from amino acids 339-602, 643-903, and 944-1214. Each spacerdomain is followed by an EGF repeat from amino acids 297-338 (egf1),603-642 (egf2), 904-943 (egf3), and 1215-1255 (egf4). The EGF repeatscontain six conserved cysteine residues and are of the B.2 class whichhas an Asn-Gly-Gly-Cys (NGGC) (SEQ ID NO:47) motif as a feature (Herz etal. 1988, EMBO J 7:4119-27) (FIG. 6(a) (SEQ ID NO:9 to 22)). A singleunit defined as an EGF precursor spacer domain and an EGF repeat, isrepeated four times in LRP5. The last EGF repeat is adjacent to threeconsecutive LDLR repeats from amino acids 1257-1295(ldlr1), 1296-1333(ldlr2), and 1334-1372 (ldlr3). The LDLR repeats have the conservedcysteine residues, as well as, the motif Ser-Asp-Glu (SDE) as acharacteristic feature (FIG. 6(a) (SEQ ID NOS:9 to 22)). There arethirteen amino acids separating the LDLR repeats from the putativetransmembrane spanning domain of 23 amino acids from 1386-1408. Theputative extracellular domain of LRP5 has six potential sites forN-linked glycosylation at amino acid residues 93, 138, 446, 499, 705,and 878 (FIG. 5(d) (SEQ ID NO:3)).

The intracellular domain of LRP5 is comprised of 207 amino acids whichis longer than most members of the family but similar in size to LRP2(Saito et. al. (1994) PNAS 91:9725-9729). It does not exhibit similarityto the LDL-receptor family, nor is it similar to any other knownproteins. The cytoplasmic domain of LRP5 is comprised of 16% proline and15% serine residues (FIG. 5(d) (SEQ ID NO:3)). Most members of theLDL-receptor family contain a conserved NPXY motif in the cytoplasmicdomain which has been implicated in endocytosis by coated pits (Chen et.al. (1990) J. Biol. Chem. 265: 3116-3123). Mutagenesis studies haveindicated that the critical residue for recognition by components of theendocytotic process is the tyrosine residue (Davis, et al. (1987) Cell45: 15-24). Replacement of the tyrosine residue by phenylalanine ortryptophan is tolerated, thus the minimal requirement for this residueappears to be that it is aromatic amino acid (Davis, et al. (1987) Cell45: 15-24). Structural studies have indicated that the critical functionof the NP residues is to provide a beta-turn that presents the aromaticresidue (Bansal and Gierasch (1991) Cell 67: 1195-1201).

Although the cytoplasmic domain of LRP5 does not contain an NPXY motif,there are several aromatic residues in the LRP5 cytoplasmic domain thatlie in putative turn regions (FIG. 5(d) (SEQ ID NO:3)) and thus may beinvolved in facilitating endocytosis. In particular tyrosine 1473 whichoccurs in the sequence VPLY (SEQ ID NOS:48) motif has the proline andtyrosine in the correct position, relative to the consensus motif.Although the NPXY motif has been implicated in endocytosis in severalproteins it is not an absolute requirement as there are proteins thatlack the NPXY motif, e.g. the transferrin receptor, that undergoendocytosis by coated pits (Chen, et. al. (1990) J. Biol. Chem. 265:3116-3123). In any event, we anticipate that the primary function ofthis protein will be receptor mediated endocytosis of its ligand.

Potential Roles of LRP5

The ability of members of the LDL-receptor family to bind multipleligands suggests that LRP5 may function to bind one or more ligands.Moreover, in a fashion analogous to other members of the family, oncebound the LRP5 receptor ligand complex would endocytose resulting inclearance of the ligand from the extracellular milieu. The nature of theLRP5 ligand may be a lipid, a protein, a protein complex, or alipoprotein and may possess a variety of functions. Although thephysiological function of the most closely related member of theLDL-receptor family, LRP1, is uncertain, it does possess a number ofbiochemical activities. LRP1 binds to alpha-2 macroglobulin. Alpha-2macroglobulin is a plasma complex that contains a “bait” ligand for avariety of proteinases e.g. trypsin, chymotrypsin, pancreatic elastaseand plasma kallikrein (Jensen (1989) J. Biol. Chem. 20:11539-11542).Once the proteinase binds and enzymatically cleaves the “bait” alpha-2macroglobulin undergoes a conformational change and “traps” theproteinase. The proteinase:alpha-2 macroglobulin complex is rapidlycleared by LRP. This mechanism scavenges proteinases that have thepotential to mediate a variety of biological functions e.g. antigenprocessing and proteinase secretion (Strickland et. al. (1990) J. Biol.Chem. 265: 17401-17404). The importance of this function is evidenced bythe prenatal death of Lrp1 knockout mice (Zee et. al. (1994) Genomics23: 256-259).

Antigen presentation is a critical component in the development of IDDMas is evidenced by the pivotal role of MHC haplotypes in conferringdisease susceptibility (Tisch and McDivitt (1996) Cell 85: 291-297). Byanalogy with LRP1, LRP5 may play a role in antigen presentation in whichcase polymorphisms within this gene could affect the development ofautoimmunity in the type 1 diabetic patient.

The alpha-2 macroglobulin complex also binds cytokines and growthfactors such as interleukin-1 beta, interleukin 2, interleukin 6,transforming growth factor-beta, and fibroblast growth factor (Moestrupand Gliemann (1991) J. Biol. Chem. 266: 14011-14017). Thus the alpha-2macroglobulin receptor has the potential to play a role in the clearanceof cytokines and growth factors. The role of cytokines in mediatingimmune and inflammatory responses is well established. For example, theinterleukin-2 gene is a strong candidate gene for the Idd3 locus in thenon-obese diabetic mouse, an animal model for type 1 diabetes (Denny et.al. (1977) Diabetes 46:695-700 If LRP5 binds alpha-2 macroglobulin orrelated complexes then it may play a role in the immune response bymediating cytokine clearance. For example, the LRP5 which is expressedin pancreas, the target tissue of IDDM, may play a role in clearingcytokines from the inflammatory infiltrate (insulitis) that is ongoingin the disease. A polymorphism in LRP5 that reduces the ability of LRP5to clear cytokines may increase an individuals susceptibility todeveloping IDDM. Furthermore an individual with a polymorphism thatincreases the ability of LRP5 to clear cytokines may be protected fromdeveloping IDDM. Conversely, certain cytokines counteract othercytokines and thus removal of certain beneficial cytokines by LRP5 mayconfer disease susceptibility and thus a polymorphism that reduces LRP5activity may confer protection from developing the disease.

Increases of free fatty acids (FFA) have been shown to reduce insulinsecretion in animals (Boden et. al. (1997) Diabetes 46: 3-10). Inaddition, ApoE which is a ligand for the LDL-receptor, has beenassociated with an antioxidant activity (Miyata and Smith (1996) NatureGenet. 14: 55-61) and oxidative damage is a central pathogenic mechanismin pancreatic β-cell destruction in type 1 diabetes (Bac (1994)Endocrin. Rev. 15: 516-542). Thus alterations in the ability of LRP5 tobind ApoE and related lipoproteins may influence the susceptibility tooxidative damage in pancreatic β-cells. Transfection of forms of LRP5into β-cells may facilitate resistance of β cells to damage by theimmune system in autoimmunity and in transplantation.

A pharmacological entity termed the lipolysis-stimulated receptor (LSR)which binds and endocytoses chylomicron remnants in the presence of FFAhas been described (Mann et. al. (1995) Biochemistry 34: 10421-10431.One possible role for the LRP5 gene product is that it is responsiblefor this activity.

Another member of the LRP family is LRP2, also known as megalin andgp330, this protein has been implicated in Heymann's nephritis, anautoimmune disease of the kidney in rats (Saito et. al. (1994) PNAS 91:9725-9729). Heymann's nephritis is a model of glomerularnephritis and ischaracterized by the development of autoantibodies to the alpha-2macroglobulin receptor associated protein, also known as the Heymannnephritis antigen. The Heymann nephritis antigen binds to LRP2(Strickland et. al. (1991) J. Biol. Chem. 266: 13364-13369). LRP2 mayplay a role in this disease by clearance of this pathogenic protein. Inan analogous manner the function of LRP5 may be to bind and clearproteins in the pancreas to which the IDDM patient has generatedautoantibodies. Alternatively LRP5 itself may be an autoantigen in theIDDM patient.

LRP1 has been identified as the receptor for certain bacterial toxins(Krieger and Herz (1994) Ann. Rev. Biochem. 63: 601-637) and the humanrhinovirus (Hofer et. al. (1994) Proc. Natl. Acad. Sci. 91: 1839-42). Itis possible that a viral infection alters an individuals susceptibilityto IDDM (Epstein (1994) N. Eng. J. Med. 331: 1428-1436). If certainviruses utilize LRP5 as a mode of entry into the cell then polymorphismsin LRP5 may alter the individuals susceptibility to type 1 diabetes.

Alterations in LRP5 may participate in the pathogenesis of otherdiseases. LRP1 binds lipoproteins such as apoE and C-apolipoproteins.The clearance of lipoproteins such as apoE and apoB by the LDL receptoris its primary role, mutations in the LDL receptor lead tohypercholesterolemia (Chen et. al. (1990) J. Biol. Chem. 265:3116-3123). Therefore mutations in LRP5 that decrease the ability of theprotein to scavenge lipoproteins may cause an elevation in cholesterol.Variations in LRP5 could predispose to the development of macrovascularcomplications in diabetics, the major cause of death. In type 2diabetics, pancreatic pathology is characterised by the deposition ofamyloid. Amyloid deposition may decrease pancreatic β-cell function.LRP5 could function in the metabolism of islet amyloid and influencesusceptibility to type 2 diabetes as well as type 1 diabetes. The roleof ApoE in Alzheimer's disease indicates that proteins such as LRP1 andpossibly LRP5 have the potential to contribute to the pathogenesis ofthis disease.

Polymorphism in genes involved in the development ofosteoporosis-pseudoglioma syndrome have been mapped to a 3-cM region ofchromosome 11 which includes the gene encoding LRP5 (Gong et. al. (1996)Am. J. Hum. Genet. 59: 146-151). The pathogenic mechanism of thisdisease is unknown but is believed to involve a regulatory role,patients with have aberrant vascular growth in the vitero-retina. Thepotential role of LRP5 in the clearance of fibroblast growth factor, amediator of angiogenesis, and the chromosomal location of the genesuggests that it may play a role in this disease. This proposed functioncould also be connected with the development of retinopathy in diabetes.

Polymorphisms in the LRP5 Gene

The exons of the LRP5 gene are being scanned for polymorphisms. Thereare several polymorphisms that change an amino acid in LRP5 that havebeen identified in IDDM patients (Table 5). Of particular interest is aC to T transition, which changes an Ala codon to Val, in one of thethree conserved LDL receptor ligand binding motifs. In addition to thispolymorphism described above, a C to T transition was identified in thecodon for Asn ²⁰⁹ (with no effect on the encoded amino acid), and threepolymorphisms were identified in intronic sequences flanking the exons.An additional set of polymorphisms has been identified by comparingexperimentally derived cDNA sequences with the genomic DNA sequence(Table 5). Some of these polymorphism will be analyzed in a large numberof IDDM patients and control individuals to determine their associationwith IDDM.

A number of (approximately 30) single nucleotide polymorphisms (SNPs)were identified in the genomic DNA sequences of overlapping BAC andcosmid clones surrounding the genetic marker poly A. The contiguousgenomic sequences containing these polymorphism have been termed contig57 (FIG. 9), which contains exons 1 and 5 along with the genetic markerspoly A and D11S1917(UT5620), and contig 58 (FIG. 10) which contains thegenetic marker L3001ca and part of exon A.

Additional Experimental Evidence

A region of identity-by-descent associated with type 1 diabetes has beenidentified in the 5′ portion of the LRP5 gene. By combining data fromSNPs and microsatellite markers we have identified a regionidentical-by-descent in susceptible haplotypes, the minimal regionconsists of 25 kb which contains the putative regulatory regions of LRP5and the first exon. This strengthens the genetic evidence for LRP5 beinga diabetes risk gene. Therefore therapies that affect LRP5 may be usefulin the prevention and treatment of type 1 diabetes.

Overexpression of LRP5 in mice provides evidence for LRP5 affectinglipoprotein metabolism. Statistically significant evidence formodulation of triglycerides by LRP5 has been obtained. Thus therapiesthat affect LRP5 may be useful in the treatment of cardiovasculardisease and conditions where serum triglycerides are elevated.

Suggestive evidence was obtained for LRP5 reducing serum cholesterolwhen it is above normal. There is also evidence for the ability of LRP5to interact with very low-density lipoprotein particles and reduce theirlevels in serum. Therefore therapies that affect LRP-5 may be useful inthe treatment of cardiovascular disease and conditions where serumcholesterol levels are elevated.

Biochemical studies indicate that LRP5 has the capacity to function inthe uptake of low-density lipoprotein (LDL) particles. Thus therapiesthat affect LRP5 may be useful in the treatment of cardiovasculardisease where LDL levels are elevated.

Overexpression of LRP5 in mice provided statistically significantevidence for a reduction in serum alkaline phosphatase. A reduction inserum alkaline phosphatase is consistent with LRP5 playing a role inmodulation of the immune response. This provides evidence for LRP5participating in the pathogenesis of type 1 diabetes. Thereforetherapies that affect LRP5 may be useful in the treatment of autoimmunediseases.

Cellular localization of LRP5 indicates that it is expressed in aparticular subtype, the phagocytic macrophages, of mature tissuemacrophages. Evidence from the literature indicates that this class ofmacrophages is involved in autoimmune disease, supporting a role forLRP5 in autoimmune disease and type 1 diabetes. Therefore therapies thataffect LRP5 may be useful in the treatment of autoimmune diseases.

Full length cDNAs for both human and mouse LRP5 have been obtained.Antibodies directed against LRP5 have been developed. These reagentsprovide tools to further analyze the biological function of LRP5.

Irrespective of LRP5's actual mode of action and involvement in IDDM andother diseases, the experimental work described herein establishes andsupports the practical applications which are disclosed as aspects andembodiments of the present invention.

According to one aspect of the present invention there is provided anucleic acid molecule which has a nucleotide sequence encoding apolypeptide which includes the amino acid sequence shown in FIG. 5(c)(SEQ ID NO:3), FIG. 5(d) (SEQ ID NO:3) or FIG. 5(e) (SEQ ID NO:4). Theamino acid sequence of FIG. 5(c) (SEQ ID NO:3) includes that of FIG.5(e) (SEQ ID NO:4) and a signal sequence.

The coding sequence may be that shown included in FIG. 5(a) (SEQ IDNO: 1) or FIG. 5(b) (SEQ ID NO:2) or it may be a mutant, variant,derivative or allele of the sequence shown. The sequence may differ fromthat shown by a change which is one or more of addition, insertion,deletion and substitution of one or more nucleotides of the sequenceshown. Changes to a nucleotide sequence may result in an amino acidchange at the protein level, or not, as determined by the genetic code.

Thus, nucleic acid according to the present invention may include asequence different from the sequence shown in FIG. 5(a) (SEQ ID NO: 1)or FIG. 5(b) (SEQ ID NO:2) yet encode a polypeptide with the same aminoacid sequence. The amino acid sequence shown in FIG. 5(c) (SEQ ID NO:3)consists of 1615 residues.

On the other hand the encoded polypeptide may comprise an amino acidsequence which differs by one or more amino acid residues from the aminoacid sequence shown in FIG. 5(c) (SEQ ID NO:3). Nucleic acid encoding apolypeptide which is an amino acid sequence mutant, variant, derivativeor allele of the sequence shown in FIG. 5(c) (SEQ ID NO:3) is furtherprovided by the present invention. Such polypeptides are discussedbelow. Nucleic acid encoding such a polypeptide may show at thenucleotide sequence and/or encoded amino acid level greater than about60% homology with the coding sequence shown in FIG. 5(a) (SEQ ID NO: 1)and/or the amino acid sequence shown in FIG. 5(c) (SEQ ID NO:3), greaterthan about 70% homology, greater than about 80% homology, greater thanabout 90% homology or greater than about 95% homology. For amino acid“homology”, this may be understood to be similarity (according to theestablished principles of amino acid similarity, e.g. as determinedusing the algorithm GAP (Genetics Computer Group, Madison, Wis.) oridentity. GAP uses the Needleman and Wunsch algorithm to align twocomplete sequences that maximizes the number of matches and minimizesthe number of gaps. Generally, the default parameters are used, with agap creation penalty=12 and gap extension penalty=4. Use of either ofthe terms “homology” and “homologous” herein does not imply anynecessary evolutionary relationship between compared sequences, inkeeping for example with standard use of terms such as “homologousrecombination” which merely requires that two nucleotide sequences aresufficiently similar to recombine under the appropriate conditions.Further discussion of polypeptides according to the present invention,which may be encoded by nucleic acid according to the present invention,is found below.

The present invention extends to nucleic acid that hybridizes with anyone or more of the specific sequences disclosed herein under stringentconditions. Suitable conditions include, e.g. for detection of sequencesthat are about 80-90% identical such as detection of mouse LRP5 with ahuman probe or vice versa, hybridization overnight at 42° C. in 0.25MNa₂HPO₄, pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 55°C. in 0.1×SSC, 0.1% SDS. For detection of sequences that are greaterthan about 90% identical, suitable conditions include hybridizationovernight at 65° C. in 0.25M Na₂HPO₄, pH 7.2, 6.5% SDS, 10% dextransulfate and a final wash at 60° C. in 0.1×SSC, 0.1% SDS.

The coding sequence may be included within a nucleic acid molecule whichhas the sequence shown in FIG. 5(a) (isoform 1) (SEQ ID NO: 1) or FIG.5(b) (SEQ ID NO:2) and encode the full polypeptide of isoform 1 (FIG.5(c) (SEQ ID NO:3)). Mutants, variants, derivatives and alleles of thesesequences are included within the scope of the present invention interms analogous to those set out in the preceding paragraph and in thefollowing disclosure.

Also provided by the present invention in various aspects andembodiments is a nucleic acid molecule encoding a polypeptide whichincludes the amino acid sequence shown in FIG. 17(b) (SEQ ID NO:39).This sequence forms a substantial part of the amino acid sequence shownin FIG. 5(e) (SEQ ID NO:4). Nucleic acid encoding a polypeptide whichincludes the amino acid sequence shown in FIG. 17(b) (SEQ ID NO:39) mayinclude the coding sequence shown in FIG. 17(b) (SEQ ID NO:39), or anallele, variant, mutant or derivative in similar terms to thosediscussed above and below for other aspects and embodiments of thepresent invention.

According to various aspects of the present invention there are alsoprovided various isoforms of the LRP5 polypeptide and gene. The gene ofFIG. 5 is known as isoform 1. Included within the present invention is anucleic acid molecule which has a nucleotide sequence encoding apolypeptide which includes the amino acid sequence of a polypeptideshown in FIG. 11(c) (isoform 2) (SEQ ID NO:25). The coding sequence maybe as shown in FIG. 11(b) (SEQ ID NO:24) (which may be included within amolecule which has the sequence shown in FIG. 11(a) (isoform 2) (SEQ IDNO:23) or the sequence shown in FIG. 12(a) (isoform 3) (SEQ ID NO:26)),FIG. 13 (isoform 4) (SEQ ID NO:31), FIG. 14 (isoform 5) (SEQ ID NO:32)and FIG. 15 (isoform 6) (SEQ ID NO:33). Mutants, derivatives, variantsand alleles of these sequences are also provided by the presentinvention, as disclosed.

Further nucleic acid molecules according to the present inventioninclude the nucleotide sequence of any of FIG. 5(a) (SEQ ID NO: 1), FIG.12(b) (SEQ ID NO:27), FIG. 12(e) (SEQ ID NO:30), FIG. 15(b) (SEQ IDNO:34), FIG. 16(a) (SEQ ID NO:35) and FIG. 16(b) (SEQ ID NO:36) andnucleic acid encoding the amino acid sequences encoded by FIG. 5(a) (SEQID NO: 1), FIG. 11(b) (SEQ ID NO:24), FIG. 12(c) (SEQ ID NO:28) or FIG.16(c) (SEQ ID NO:37), along with mutants, alleles, variants andderivatives of these sequences. Further included are nucleic acidmolecules encoding the amino acid sequence of FIG. 18(c) (SEQ ID NO:42),particularly including the coding sequence shown in FIG. 18(b) (SEQ IDNO:41)

Particular alleles according to the present invention have sequenceshave a variation indicated in Table 5 or Table 6. One or more of thesemay be associated with susceptibility to IDDM or other disease.Alterations in a sequence according to the present invention which areassociated with IDDM or other disease may be preferred in accordancewith embodiments of the present invention. Implications for screening,e.g. for diagnostic or prognostic purposes, are discussed below.

Generally, nucleic acid according to the present invention is providedas an isolate, in isolated and/or purified form, or free orsubstantially free of material with which it is naturally associated,such as free or substantially free of nucleic acid flanking the gene inthe human genome, except possibly one or more regulatory sequence(s) forexpression. Nucleic acid may be wholly or partially synthetic and mayinclude genomic DNA, cDNA or RNA. The coding sequence shown herein is aDNA sequence. Where nucleic acid according to the invention includesRNA, reference to the sequence shown should be construed as encompassingreference to the RNA equivalent, with U substituted for T.

Nucleic acid may be provided as part of a replicable vector, and alsoprovided by the present invention are a vector including nucleic acid asset out above, particularly any expression vector from which the encodedpolypeptide can be expressed under appropriate conditions, and a hostcell containing any such vector or nucleic acid. An expression vector inthis context is a nucleic acid molecule including nucleic acid encodinga polypeptide of interest and appropriate regulatory sequences forexpression of the polypeptide, in an in vitro expression system, e.g.reticulocyte lysate, or in vivo, e.g. in eukaryotic cells such as COS orCHO cells or in prokaryotic cells such as E. coli. This is discussedfurther below.

The nucleic acid sequence provided in accordance with the presentinvention is useful for identifying nucleic acid of interest (and whichmay be according to the present invention) in a test sample. The presentinvention provides a method of obtaining nucleic acid of interest, themethod including hybridisation of a probe having the sequence shown inany of FIGS. 5(a), 11(a), 11(b), 12(a), 12(b), 12(c), 12(e), 13, 14, 15,15(b) 16(a), 16(b), and 16(c), or a complementary sequence, to targetnucleic acid. Hybridisation is generally followed by identification ofsuccessful hybridisation and isolation of nucleic acid which hashybridised to the probe, which may involve one or more steps of PCR. Itwill not usually be necessary to use a probe with the complete sequenceshown in any of these figures. Shorter fragments, particularly fragmentswith a sequence encoding the conserved motifs (FIG. 5(c,d), and FIG.6(a) (SEQ ID NOS:9 to 22)) may be used.

Nucleic acid according to the present invention is obtainable using oneor more oligonucleotide probes or primers designed to hybridise with oneor more fragments of the nucleic acid sequence shown in any of thefigures, particularly fragments of relatively rare sequence, based oncodon usage or statistical analysis. A primer designed to hybridise witha fragment of the nucleic acid sequence shown in any of the figures maybe used in conjunction with one or more oligonucleotides designed tohybridise to a sequence in a cloning vector within which target nucleicacid has been cloned, or in so-called “RACE” (rapid amplification ofcDNA ends) in which cDNA's in a library are ligated to anoligonucleotide linker and PCR is performed using a primer whichhybridises with a sequence shown and a primer which hybridises to theoligonucleotide linker.

Such oligonucleotide probes or primers, as well as the full-lengthsequence (and mutants, alleles, variants and derivatives) are alsouseful in screening a test sample containing nucleic acid for thepresence of alleles, mutants and variants, with diagnostic and/orprognostic implications as discussed in more detail below.

Nucleic acid isolated and/or purified from one or more cells (e.g.human) or a nucleic acid library derived from nucleic acid isolatedand/or purified from cells (e.g. a cDNA library derived from mRNAisolated from the cells), may be probed under conditions for selectivehybridisation and/or subjected to a specific nucleic acid amplificationreaction such as the polymerase chain reaction (PCR) (reviewed forinstance in “PCR protocols; A Guide to Methods and Applications”, Eds.Innis et al, 1990, Academic Press, New York, Mullis et al, Cold SpringHarbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed), PCR technology,Stockton Press, New York, 1989, and Ehrlich et al, Science,252:1643-1650, (1991)). PCR comprises steps of denaturation of templatenucleic acid (if double-stranded), annealing of primer to target, andpolymerisation. The nucleic acid probed or used as template in theamplification reaction may be genomic DNA, cDNA or RNA. Other specificnucleic acid amplification techniques include strand displacementactivation, the QB replicase system, the repair chain reaction, theligase chain reaction and ligation activated transcription. Forconvenience, and because it is generally preferred, the term PCR is usedherein in contexts where other nucleic acid amplification techniques maybe applied by those skilled in the art. Unless the context requiresotherwise, reference to PCR should be taken to cover use of any suitablenucleic amplification reaction available in the art.

In the context of cloning, it may be necessary for one or more genefragments to be ligated to generate a full-length coding sequence. Also,where a full-length encoding nucleic acid molecule has not beenobtained, a smaller molecule representing part of the full molecule, maybe used to obtain full-length clones. Inserts may be prepared frompartial cDNA clones and used to screen cDNA libraries. The full-lengthclones isolated may be subcloned into expression vectors and activityassayed by transfection into suitable host cells, e.g. with a reporterplasmid.

A method may include hybridisation of one or more (e.g. two) probes orprimers to target nucleic acid. Where the nucleic acid isdouble-stranded DNA, hybridisation will generally be preceded bydenaturation to produce single-stranded DNA. The hybridisation may be aspart of a PCR procedure, or as part of a probing procedure not involvingPCR. An example procedure would be a combination of PCR and lowstringency hybridisation. A screening procedure, chosen from the manyavailable to those skilled in the art, is used to identify successfulhybridisation events and isolated hybridised nucleic acid.

Binding of a probe to target nucleic acid (e.g. DNA) may be measuredusing any of a variety of techniques at the disposal of those skilled inthe art. For instance, probes may be radioactively, fluorescently orenzymatically labelled. Other methods not employing labelling of probeinclude examination of restriction fragment length polymorphisms,amplification using PCR, RN'ase cleavage and allele specificoligonucleotide probing. Probing may employ the standard Southernblotting technique. For instance DNA may be extracted from cells anddigested with different restriction enzymes. Restriction fragments maythen be separated by electrophoresis on an agarose gel, beforedenaturation and transfer to a nitrocellulose filter. Labelled probe maybe hybridised to the DNA fragments on the filter and binding determined.DNA for probing may be prepared from RNA preparations from cells.

Preliminary experiments may be performed by hybridising under lowstringency conditions various probes to Southern blots of DNA digestedwith restriction enzymes. Suitable conditions would be achieved when alarge number of hybridising fragments were obtained while the backgroundhybridisation was low. Using these conditions nucleic acid libraries,e.g. cDNA libraries representative of expressed sequences, may besearched. Those skilled in the art are well able to employ suitableconditions of the desired stringency for selective hybridisation, takinginto account factors such as oligonucleotide length and basecomposition, temperature and so on. On the basis of amino acid sequenceinformation, oligonucleotide probes or primers may be designed, takinginto account the degeneracy of the genetic code, and, where appropriate,codon usage of the organism from the candidate nucleic acid is derived.An oligonucleotide for use in nucleic acid amplification may have about10 or fewer codons (e.g. 6, 7 or 8), i.e. be about 30 or fewernucleotides in length (e.g. 18, 21 or 24). Generally specific primersare upwards of 14 nucleotides in length, but need not be than 18-20.Those skilled in the art are well versed in the design of primers foruse processes such as PCR. Various techniques for synthesizingoligonucleotide primers are well known in the art, includingphosphotriester and phosphodiester synthesis methods.

Preferred amino acid sequences suitable for use in the design of probesor PCR primers may include sequences conserved (completely,substantially or partly) encoding the motifs present in LRP5 (FIG. 5(d)(SEQ ID NO:3)).

A further aspect of the present invention provides an oligonucleotide orpolynucleotide fragment of the nucleotide sequence shown in any of thefigures herein providing nucleic acid according to the presentinvention, or a complementary sequence, in particular for use in amethod of obtaining and/or screening nucleic acid. Some preferredoligonucleotides have a sequence shown in Table 2 (SEQ ID NOS:49-54),Table 4 (SEQ ID NOS:83-317), Table 7 (SEQ ID NOS:240-317), Table 8 (SEQID NOS:318-333) or Table 9 (SEQ ID NOS:49-74, 334-402), or a sequencewhich differs from any of the sequences shown by addition, substitution,insertion or deletion of one or more nucleotides, but preferably withoutabolition of ability to hybridise selectively with nucleic acid inaccordance with the present invention, that is wherein the degree ofsimilarity of the oligonucleotide or polynucleotide with one of thesequences given is sufficiently high.

In some preferred embodiments, oligonucleotides according to the presentinvention that are fragments of any of the sequences shown, or anyallele associated with IDDM or other disease susceptibility, are atleast about 10 nucleotides in length, more preferably at least about 15nucleotides in length, more preferably at least about 20 nucleotides inlength. Such fragments themselves individually represent aspects of thepresent invention. Fragments and other oligonucleotides may be used asprimers or probes as discussed but may also be generated (e.g. by PCR)in methods concerned with determining the presence in a test sample of asequence indicative of IDDM or other disease susceptibility.

Methods involving use of nucleic acid in diagnostic and/or prognosticcontexts, for instance in determining susceptibility to IDDM or otherdisease, and other methods concerned with determining the presence ofsequences indicative of IDDM or other disease susceptibility arediscussed below.

Further embodiments of oligonucleotides according to the presentinvention are anti-sense oligonucleotide sequences based on the nucleicacid sequences described herein. Anti-sense oligonucleotides may bedesigned to hybridise to the complementary sequence of nucleic acid,pre-mRNA or mature mRNA, interfering with the production of polypeptideencoded by a given DNA sequence (e.g. either native polypeptide or amutant form thereof), so that its expression is reduce or preventedaltogether. Anti-sense techniques may be used to target a codingsequence, a control sequence of a gene, e.g. in the 5′ flankingsequence, whereby the antisense oligonucleotides can interfere withcontrol sequences. Anti-sense oligonucleotides may be DNA or RNA and maybe of around 14-23 nucleotides, particularly around 15-18 nucleotides,in length. The construction of antisense sequences and their use isdescribed in Peyman and Ulman, Chemical Reviews, 90:543-584, (1990), andCrooke, Ann. Rev. Pharmacol. Toxicol., 32:329-376, (1992).

Nucleic acid according to the present invention may be used in methodsof gene therapy, for instance in treatment of individuals with the aimof preventing or curing (wholly or partially) IDDM or other disease.This may ease one or more symptoms of the disease. This is discussedbelow.

Nucleic acid according to the present invention, such as a full-lengthcoding sequence or oligonucleotide probe or primer, may be provided aspart of a kit, e.g. in a suitable container such as a vial in which thecontents are protected from the external environment. The kit mayinclude instructions for use of the nucleic acid, e.g. in PCR and/or amethod for determining the presence of nucleic acid of interest in atest sample. A kit wherein the nucleic acid is intended for use in PCRmay include one or more other reagents required for the reaction, suchas polymerase, nucleosides, buffer solution etc. The nucleic acid may belabelled. A kit for use in determining the presence or absence ofnucleic acid of interest may include one or more articles and/orreagents for performance of the method, such as means for providing thetest sample itself, e.g. a swab for removing cells from the buccalcavity or a syringe for removing a blood sample (such componentsgenerally being sterile).

According to a further aspect, the present invention provides a nucleicacid molecule including a LRP5 gene promoter.

In another aspect, the present invention provides a nucleic acidmolecule including a promoter, the promoter including the sequence ofnucleotides shown in FIG. 12(e) (SEQ ID NO:30) or FIG. 15(b) (SEQ IDNO:34). The promoter may comprise one or more fragments of the sequenceshown in FIG. 12(e) (SEQ ID NO:30) or FIG. 15(b) (SEQ ID NO:34),sufficient to promote gene expression. The promoter may comprise orconsist essentially of a sequence of nucleotides 5′ to the LRP5 gene inthe human chromosome, or an equivalent sequence in another species, suchas the mouse.

Any of the sequences disclosed in the figures herein may be used toconstruct a probe for use in identification and isolation of a promoterfrom a genomic library containing a genomic LRP5 gene. Techniques andconditions for such probing are well known in the art and are discussedelsewhere herein. To find minimal elements or motifs responsible fortissue and/or developmental regulation, restriction enzyme or nucleasesmay be used to digest a nucleic acid molecule, followed by anappropriate assay (for example using a reporter gene such as luciferase)to determine the sequence required. A preferred embodiment of thepresent invention provides a nucleic acid isolate with the minimalnucleotide sequence shown in FIG. 12(e) (SEQ ID NO:30) or FIG. 15(b)(SEQ ID NO:34) required for promoter activity.

As noted, the promoter may comprise one or more sequence motifs orelements conferring developmental and/or tissue-specific regulatorycontrol of expression. Other regulatory sequences may be included, forinstance as identified by mutation or digest assay in an appropriateexpression system or by sequence comparison with available information,e.g. using a computer to search on-line databases.

By “promoter” is meant a sequence of nucleotides from whichtranscription may be initiated of DNA operably linked downstream (i.e.in the 3′ direction on the sense strand of double-stranded DNA).

“Operably linked” means joined as part of the same nucleic acidmolecule, suitably positioned and oriented for transcription to beinitiated from the promoter. DNA operably linked to a promoter is “undertranscriptional initiation regulation” of the promoter.

The present invention extends to a promoter which has a nucleotidesequence which is allele, mutant, variant or derivative, by way ofnucleotide addition, insertion, substitution or deletion of a promotersequence as provided herein. Preferred levels of sequence homology witha provided sequence may be analogous to those set out above for encodingnucleic acid and polypeptides according to the present invention.Systematic or random mutagenesis of nucleic acid to make an alterationto the nucleotide sequence may be performed using any technique known tothose skilled in the art. One or more alterations to a promoter sequenceaccording to the present invention may increase or decrease promoteractivity, or increase or decrease the magnitude of the effect of asubstance able to modulate the promoter activity.

“Promoter activity” is used to refer to ability to initiatetranscription. The level of promoter activity is quantifiable forinstance by assessment of the amount of mRNA produced by transcriptionfrom the promoter or by assessment of the amount of protein productproduced by translation of mRNA produced by transcription from thepromoter. The amount of a specific mRNA present in an expression systemmay be determined for example using specific oligonucleotides which areable to hybridise with the mRNA and which are labelled or may be used ina specific amplification reaction such as the polymerase chain reaction.Use of a reporter gene facilitates determination of promoter activity byreference to protein production.

Further provided by the present invention is a nucleic acid constructcomprising a LRP5 promoter region or a fragment, mutant, allele,derivative or variant thereof able to promoter transcription, operablylinked to a heterologous gene, e.g. a coding sequence. A “heterologous”or “exogenous” gene is generally not a modified form of LRP5. Generally,the gene may be transcribed into mRNA which may be translated into apeptide or polypeptide product which may be detected and preferablyquantitated following expression. A gene whose encoded product may beassayed following expression is termed a “reporter gene”, i.e. a genewhich “reports” on promoter activity.

The reporter gene preferably encodes an enzyme which catalyses areaction which produces a detectable signal, preferably a visuallydetectable signal, such as a coloured product. Many examples are known,including β-galactosidase and luciferase. β-galactosidase activity maybe assayed by production of blue colour on substrate, the assay being byeye or by use of a spectro-photometer to measure absorbance.Fluorescence, for example that produced as a result of luciferaseactivity, may be quantitated using a spectrophotometer. Radioactiveassays may be used, for instance using chloramphenicolacetyltransferase, which may also be used in nonradioactive assays. Thepresence and/or amount of gene product resulting from expression fromthe reporter gene may be determined using a molecule able to bind theproduct, such as an antibody or fragment thereof. The binding moleculemay be labelled directly or indirectly using any standard technique.

Those skilled in the art are well aware of a multitude of possiblereporter genes and assay techniques which may be used to determine geneactivity. Any suitable reporter/assay may be used and it should beappreciated that no particular choice is essential to or a limitation ofthe present invention.

Nucleic acid constructs comprising a promoter (as disclosed herein) anda heterologous gene (reporter) may be employed in screening for asubstance able to modulate activity of the promoter. For therapeuticpurposes, e.g. for treatment of IDDM or other disease, a substance ableto up-regulate expression of the promoter may be sought. A method ofscreening for ability of a substance to modulate activity of a promotermay comprise contacting an expression system, such as a host cell,containing a nucleic acid construct as herein disclosed with a test orcandidate substance and determining expression of the heterologous gene.

The level of expression in the presence of the test substance may becompared with the level of expression in the absence of the testsubstance. A difference in expression in the presence of the testsubstance indicates ability of the substance to modulate geneexpression. An increase in expression of the heterologous gene comparedwith expression of another gene not linked to a promoter as disclosedherein indicates specificity of the substance for modulation of thepromoter.

A promoter construct may be introduced into a cell line using anytechnique previously described to produce a stable cell line containingthe reporter construct integrated into the genome. The cells may begrown and incubated with test compounds for varying times. The cells maybe grown in 96 well plates to facilitate the analysis of large numbersof compounds. The cells may then be washed and the reporter geneexpression analysed. For some reporters, such as luciferase the cellswill be lysed then analysed.

Following identification of a substance which modulates or affectspromoter activity, the substance may be investigated further.Furthermore, it may be manufactured and/or used in preparation, i.e.manufacture or formulation, of a composition such as a medicament,pharmaceutical composition or drug. These may be administered toindividuals.

Thus, the present invention extends in various aspects not only to asubstance identified using a nucleic acid molecule as a modulator ofpromoter activity, in accordance with what is disclosed herein, but alsoa pharmaceutical composition, medicament, drug or other compositioncomprising such a substance, a method comprising administration of sucha composition to a patient, e.g. for increasing LRP5 expression forinstance in treatment (which may include preventative treatment) of IDDMor other disease, use of such a substance in manufacture of acomposition for administration, e.g. for increasing LRP5 expression forinstance in treatment of IDDM or other disease, and a method of making apharmaceutical composition comprising admixing such a substance with apharmaceutically acceptable excipient, vehicle or carrier, andoptionally other ingredients.

A further aspect of the present invention provides a polypeptide whichhas the amino acid sequence shown in FIG. 5(c) (SEQ ID NO:3), which maybe in isolated and/or purified form, free or substantially free ofmaterial with which it is naturally associated, such as otherpolypeptides or such as human polypeptides other than that for which theamino acid sequence is shown in FIG. 5(c) (SEQ ID NO:3), or (for exampleif produced by expression in a prokaryotic cell) lacking in nativeglycosylation, e.g. unglycosylated. Further polypeptides according tothe present invention have an amino acid sequence selected from thatshown in the polypeptide shown in FIG. 11(c) (SEQ ID NO:25), that shownin 12(d), and the partial polypeptide shown in FIG. 16(d) (SEQ ID NO:8).

Polypeptides which are amino acid sequence variants, alleles,derivatives or mutants are also provided by the present invention. Apolypeptide which is a variant, allele, derivative or mutant may have anamino acid sequence which differs from that given in a figure herein byone or more of addition, substitution, deletion and insertion of one ormore amino acids. Preferred such polypeptides have LRP5 function, thatis to say have one or more of the following properties: immunologicalcross-reactivity with an antibody reactive the polypeptide for which thesequence is given in a figure herein; sharing an epitope with thepolypeptide for which the amino acid sequence is shown in a figureherein (as determined for example by immunological cross-reactivitybetween the two polypeptides; a biological activity which is inhibitedby an antibody raised against the polypeptide whose sequence is shown ina figure herein; ability to reduce serum triglyceride; ability to reduceserum cholesterol; ability to interact with and/or reduce serum levelsof very low-density lipoprotein particles; ability to affect serumalkaline phosphatase levels. Alteration of sequence may change thenature and/or level of activity and/or stability of the LRP5 protein.

A polypeptide which is an amino acid sequence variant, allele,derivative or mutant of the amino acid sequence shown in a figure hereinmay comprise an amino acid sequence which shares greater than about 35%sequence identity with the sequence shown, greater than about 40%,greater than about 50%, greater than about 60%, greater than about 70%,greater than about 80%, greater than about 90% or greater than about95%. The sequence may share greater than about 60% similarity, greaterthan about 70% similarity, greater than about 80% similarity or greaterthan about 90% similarity with the amino acid sequence shown in therelevant figure. Amino acid similarity is generally defined withreference to the algorithm GAP (Genetics Computer Group, Madison, Wis.)as noted above, or the TBLASTN program, of Altschul et al. (1990) J.Mol. Biol. 215: 403-10. Similarity allows for “conservative variation”,i.e. substitution of one hydrophobic residue such as isoleucine, valine,leucine or methionine for another, or the substitution of one polarresidue for another, such as arginine for lysine, glutamic for asparticacid, or glutamine for asparagine. Particular amino acid sequencevariants may differ from that shown in a figure herein by insertion,addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5-10, 10-2020-30, 30-50, 50-100, 100-150, or more than 150 amino acids.

Sequence comparison may be made over the full-length of the relevantsequence shown herein, or may more preferably be over a contiguoussequence of about or greater than about 20, 25, 30, 33, 40, 50, 67, 133,167, 200, 233, 267, 300, 333, 400, 450, 500, 600, 700, 800, 900, 1000,1100, 1200, 1300, 1400, 1500, 1600, or more amino acids or nucleotidetriplets, compared with the relevant amino acid sequence or nucleotidesequence as the case may be.

The present invention also includes active portions, fragments,derivatives and functional mimetics of the polypeptides of theinvention. An “active portion” of a polypeptide means a peptide which isless than said full length polypeptide, but which retains a biologicalactivity, such as a biological activity selected from binding to ligand,involvement in endocytosis. Thus an active portion of the LRP5polypeptide may, in one embodiment, include the transmembrane domain andthe portion of the cytoplasmic tail involved in endocytosis. Such anactive fragment may be included as part of a fusion protein, e.g.including a binding portion for a different ligand. In differentembodiments, combinations of LDL and EGF motifs may be included in amolecule to confer on the molecule different binding specificities.

A “fragment” of a polypeptide generally means a stretch of amino acidresidues of at least about five contiguous amino acids, often at leastabout seven contiguous amino acids, typically at least about ninecontiguous amino acids, more preferably at least about 13 contiguousamino acids, and, more preferably, at least about 20 to 30 or morecontiguous amino acids. Fragments of the LRP5 polypeptide sequence mayinclude antigenic determinants or epitopes useful for raising antibodiesto a portion of the amino acid sequence. Alanine scans are commonly usedto find and refine peptide motifs within polypeptides, this involvingthe systematic replacement of each residue in turn with the amino acidalanine, followed by an assessment of biological activity.

Preferred fragments of LRP5 include those with any of the followingamino acid sequences:

SYFHLFPPPPSPCTDSS (SEQ ID NO:403)

VDGRQNIKRAKDDGT (SEQ ID NO:404)

EVLFTTGLIRPVALVVDN (SEQ ID NO:405)

IQGHLDFVMDILVFHS, (SEQ ID NO:406)

which may be used for instance in raising or isolating antibodies.Variant and derivative peptides, peptides which have an amino acidsequence which differs from one of these sequences by way of addition,insertion, deletion or substitution of one or more amino acids are alsoprovided by the present invention, generally with the proviso that thevariant or derivative peptide is bound by an antibody or other specificbinding member which binds one of the peptides whose sequence is shown.A peptide which is a variant or derivative of one of the shown peptidesmay compete with the shown peptide for binding to a specific bindingmember, such as an antibody or antigen-binding fragment thereof.

A “derivative” of a polypeptide or a fragment thereof may include apolypeptide modified by varying the amino acid sequence of the protein,e.g. by manipulation of the nucleic acid encoding the protein or byaltering the protein itself. Such derivatives of the natural amino acidsequence may involve one or more of insertion, addition, deletion orsubstitution of one or more amino acids, which may be withoutfundamentally altering the qualitative nature of biological activity ofthe wild type polypeptide. Also encompassed within the scope of thepresent invention are functional mimetics of active fragments of theLRP5 polypeptides provided (including alleles, mutants, derivatives andvariants). The term “functional mimetic” means a substance which may notcontain an active portion of the relevant amino acid sequence, andprobably is not a peptide at all, but which retains in qualitative termsbiological activity of natural LRP5 polypeptide. The design andscreening of candidate mimetics is described in detail below.

Sequences of amino acid sequence variants representative of preferredembodiments of the present invention are shown in Table 5 and Table 6.Screening for the presence of one or more of these in a test sample hasa diagnostic and/or prognostic use, for instance in determining IDDM orother disease susceptibility, as discussed below.

Other fragments of the polypeptides for which sequence information isprovided herein are provided as aspects of the present invention, forinstance corresponding to functional domains. One such functional domainis the putative extracellular domain, such that a polypeptide fragmentaccording to the present invention may include the extracellular domainof the polypeptide of which the amino acid sequence is shown in FIG.5(e) (SEQ ID NO:4) or FIG. 5(c) (SEQ ID NO:3). This runs to amino acid1385 of the precursor sequence of FIG. 5(c) (SEQ ID NO:3). Anotheruseful LRP5 domain is the cytoplasmic domain, 207 amino acids shown inFIG. 5(d) (SEQ ID NO:3). This may be used in targeting proteins to movethrough the endocytotic pathway.

A polypeptide according to the present invention may be isolated and/orpurified (e.g. using an antibody) for instance after production byexpression from encoding nucleic acid (for which see below). Thus, apolypeptide may be provided free or substantially free from contaminantswith which it is naturally associated (if it is a naturally-occurringpolypeptide). A polypeptide may be provided free or substantially freeof other polypeptides. Polypeptides according to the present inventionmay be generated wholly or partly by chemical synthesis. The isolatedand/or purified polypeptide may be used in formulation of a composition,which may include at least one additional component, for example apharmaceutical composition including a pharmaceutically acceptableexcipient, vehicle or carrier. A composition including a polypeptideaccording to the invention may be used in prophylactic and/ortherapeutic treatment as discussed below.

A polypeptide, peptide fragment, allele, mutant, derivative or variantaccording to the present invention may be used as an immunogen orotherwise in obtaining specific antibodies. Antibodies are useful inpurification and other manipulation of polypeptides and peptides,diagnostic screening and therapeutic contexts. This is discussed furtherbelow.

A polypeptide according to the present invention may be used inscreening for molecules which affect or modulate its activity orfunction, e.g. binding to ligand, involvement in endocytosis, movementfrom an intracellular compartment to the cell surface, movement from thecell surface to an intracellular compartment. Such molecules mayinteract with the ligand binding portion of LRP5, the cytoplasmicportion of LRP5, or with one or more accessory molecules e.g. involvedin movement of vesicles containing LRP5 to and from the cell surface,and may be useful in a therapeutic (possibly includingprophylactic)-context.

It is well known that pharmaceutical research leading to theidentification of a new drug may involve the screening of very largenumbers of candidate substances, both before and even after a leadcompound has been found. This is one factor which makes pharmaceuticalresearch very expensive and time-consuming. Means for assisting in thescreening process can have considerable commercial importance andutility. Such means for screening for substances potentially useful intreating or preventing IDDM or other disease is provided by polypeptidesaccording to the present invention. Substances identified as modulatorsof the polypeptide represent an advance in the fight against IDDM andother diseases since they provide basis for design and investigation oftherapeutics for in vivo use. Furthermore, they may be useful in any ofa number of conditions, including autoimmune diseases, such asglomerulonephritis, diseases and disorders involving disruption ofendocytosis and/or antigen presentation, diseases and disordersinvolving cytokine clearance and/or inflammation, viral infection,pathogenic bacterial toxin contamination, elevation of free fatty acidsor hypercholesterolemia, type 2 diabetes, osteoporosis, and Alzheimer'sdisease, given the functional indications for LRP5, discussed elsewhereherein. As noted elsewhere, LRP5, fragments thereof, and nucleic acidaccording to the invention may also be useful in combatting any of thesediseases and disorders.

A method of screening for a substance which modulates activity of apolypeptide may include contacting one or more test substances with thepolypeptide in a suitable reaction medium, testing the activity of thetreated polypeptide and comparing that activity with the activity of thepolypeptide in comparable reaction medium untreated with the testsubstance or substances. A difference in activity between the treatedand untreated polypeptides is indicative of a modulating effect of therelevant test substance or substances.

Combinatorial library technology (Schultz, J S (1996) Biotechnol. Prog.12:729-743) provides an efficient way of testing a potentially vastnumber of different substances for ability to modulate activity of apolypeptide. Prior to or as well as being screened for modulation ofactivity, test substances may be screened for ability to interact withthe polypeptide, e.g. in a yeast two-hybrid system (which requires thatboth the polypeptide and the test substance can be expressed in yeastfrom encoding nucleic acid). This may be used as a coarse screen priorto testing a substance for actual ability to modulate activity of thepolypeptide.

Following identification of a substance which modulates or affectspolypeptide activity, the substance may be investigated further.Furthermore, it may be manufactured and/or used in preparation, i.e.manufacture or formulation, of a composition such as a medicament,pharmaceutical composition or drug. These may be administered toindividuals.

Thus, the present invention extends in various aspects not only to asubstance identified as a modulator of polypeptide activity, inaccordance with what is disclosed herein, but also a pharmaceuticalcomposition, medicament, drug or other composition comprising such asubstance, a method comprising administration of such a composition to apatient, e.g. for treatment (which may include preventative treatment)of IDDM or other disease, use of such a substance in manufacture of acomposition for administration, e.g. for treatment of IDDM or otherdisease, and a method of making a pharmaceutical composition comprisingadmixing such a substance with a pharmaceutically acceptable excipient,vehicle or carrier, and optionally other ingredients.

A substance identified using as a modulator of polypeptide or promoterfunction may be peptide or non-peptide in nature. Non-peptide “smallmolecules” are often preferred for many in vivo pharmaceutical uses.Accordingly, a mimetic or mimick of the substance (particularly if apeptide) may be designed for pharmaceutical use. The designing ofmimetics to a known pharmaceutically active compound is a known approachto the development of pharmaceuticals based on a “lead” compound. Thismight be desirable where the active compound is difficult or expensiveto synthesise or where it is unsuitable for a particular method ofadministration, e.g. peptides are not well suited as active agents fororal compositions as they tend to be quickly degraded by proteases inthe alimentary canal. Mimetic design, synthesis and testing may be usedto avoid randomly screening large number of molecules for a targetproperty.

There are several steps commonly taken in the design of a mimetic from acompound having a given target property. Firstly, the particular partsof the compound that are critical and/or important in determining thetarget property are determined. In the case of a peptide, this can bedone by systematically varying the amino acid residues in the peptide,e.g. by substituting each residue in turn. These parts or residuesconstituting the active region of the compound are known as its“pharmacophore”.

Once the pharmacophore has been found, its structure is modelled toaccording its physical properties, e.g. stereochemistry, bonding, sizeand/or charge, using data from a range of sources, e.g. spectroscopictechniques, X-ray diffraction data and NMR. Computational analysis,similarity mapping (which models the charge and/or volume of apharmacophore, rather than the bonding between atoms) and othertechniques can be used in this modelling process.

In a variant of this approach, the three-dimensional structure of theligand and its binding partner are modelled. This can be especiallyuseful where the ligand and/or binding partner change conformation onbinding, allowing the model to take account of this the design of themimetic.

A template molecule is then selected onto which chemical groups whichmimic the pharmacophore can be grafted. The template molecule and thechemical groups grafted on to it can conveniently be selected so thatthe mimetic is easy to synthesise, is likely to be pharmacologicallyacceptable, and does not degrade in vivo, while retaining the biologicalactivity of the lead compound. The mimetic or mimetics found by thisapproach can then be screened to see whether they have the targetproperty, or to what extent they exhibit it. Further optimisation ormodification can then be carried out to arrive at one or more finalmimetics for in vivo or clinical testing.

Mimetics of substances identified as having ability to modulate LRP5polypeptide or promoter activity using a screening method as disclosedherein are included within the scope of the present invention. Apolypeptide, peptide or substance able to modulate activity of apolypeptide according to the present invention may be provided in a kit,e.g. sealed in a suitable container which protects its contents from theexternal environment. Such a kit may include instructions for use.

A convenient way of producing a polypeptide according to the presentinvention is to express nucleic acid encoding it, by use of the nucleicacid in an expression system. Accordingly, the present invention alsoencompasses a method of making a polypeptide (as disclosed), the methodincluding expression from nucleic acid encoding the polypeptide(generally nucleic acid according to the invention). This mayconveniently be achieved by growing a host cell in culture, containingsuch a vector, under appropriate conditions which cause or allowexpression of the polypeptide. Polypeptides may also be expressed in invitro systems, such as reticulocyte lysate.

Systems for cloning and expression of a polypeptide in a variety ofdifferent host cells are well known. Suitable host cells includebacteria, eukaryotic cells such as mammalian and yeast, and baculovirussystems. Mammalian cell lines available in the art for expression of aheterologous polypeptide include Chinese hamster ovary cells, HeLacells, baby hamster kidney cells, COS cells and many others. A common,preferred bacterial host is E. coli. Suitable vectors can be chosen orconstructed, containing appropriate regulatory sequences, includingpromoter sequences, terminator fragments, polyadenylation sequences,enhancer sequences, marker genes and other sequences as appropriate.Vectors may be plasmids, viral e.g. ′phage, or phagemid, as appropriate.For further details see, for example, Molecular Cloning: a LaboratoryManual: 2nd edition, Sambrook et al., 1989, Cold Spring HarborLaboratory Press. Many known techniques and protocols for manipulationof nucleic acid, for example in preparation of nucleic acid constructs,mutagenesis, sequencing, introduction of DNA into cells and geneexpression, and analysis of proteins, are described in detail in CurrentProtocols in Molecular Biology, Ausubel et al. eds., John Wiley & Sons,1992.

Thus, a further aspect of the present invention provides a host cellcontaining nucleic acid as disclosed herein. The nucleic acid of theinvention may be integrated into the genome (e.g. chromosome) of thehost cell. Integration may be promoted by inclusion of sequences whichpromote recombination with the genome, in accordance with standardtechniques. The nucleic acid may be on an extra-chromosomal vectorwithin the cell.

A still further aspect provides a method which includes introducing thenucleic acid into a host cell. The introduction, which may (particularlyfor in vitro introduction) be generally referred to without limitationas “transformation”, may employ any available technique. For eukaryoticcells, suitable techniques may include calcium phosphate transfection,DEAE-Dextran, electroporation, liposome-mediated transfection andtransduction using retrovirus or other virus, e.g. vaccinia or, forinsect cells, baculovirus. For bacterial cells, suitable techniques mayinclude calcium chloride transformation, electroporation andtransfection using bacteriophage.

Marker genes such as antibiotic resistance or sensitivity genes may beused in identifying clones containing nucleic acid of interest, as iswell known in the art.

The introduction may be followed by causing or allowing expression fromthe nucleic acid, e.g. by culturing host cells (which may include cellsactually transformed although more likely the cells will be descendantsof the transformed cells) under conditions for expression of the gene,so that the encoded polypeptide is produced. If the polypeptide isexpressed coupled to an appropriate signal leader peptide it may besecreted from the cell into the culture medium. Following production byexpression, a polypeptide may be isolated and/or purified from the hostcell and/or culture medium, as the case may be, and subsequently used asdesired, e.g. in the formulation of a composition which may include oneor more additional components, such as a pharmaceutical compositionwhich includes one or more pharmaceutically acceptable excipients,vehicles or carriers (e.g. see below).

Introduction of nucleic acid may take place in vivo by way of genetherapy, as discussed below. A host cell containing nucleic acidaccording to the present invention, e.g. as a result of introduction ofthe nucleic acid into the cell or into an ancestor of the cell and/orgenetic alteration of the sequence endogenous to the cell or ancestor(which introduction or alteration may take place in vivo or ex vivo),may be comprised (e.g. in the soma) within an organism which is ananimal, particularly a mammal, which may be human or non-human, such asrabbit, guinea pig, rat, mouse or other rodent, cat, dog, pig, sheep,goat, cattle or horse, or which is a bird, such as a chicken.Genetically modified or transgenic animals or birds comprising such acell are also provided as further aspects of the present invention.

Thus, in various further aspects, the present invention provides anon-human animal with a human LRP5 transgene within its genome. Thetransgene may have the sequence of any of the isoforms identified hereinor a mutant, derivative, allele or variant thereof as disclosed. In onepreferred embodiment, the heterologous human LRP5 sequence replaces theendogenous animal sequence. In other preferred embodiments, one or morecopies of the human LRP5 sequence are added to the animal genome.

Preferably the animal is a rodent, and most preferably mouse or rat.

This may have a therapeutic aim. (Gene therapy is discussed below.) Thepresence of a mutant, allele or variant sequence within cells of anorganism, particularly when in place of a homologous endogenoussequence, may allow the organism to be used as a model in testing and/orstudying the role of the LRP5 gene or substances which modulate activityof the encoded polypeptide and/or promoter in vitro or are otherwiseindicated to be of therapeutic potential.

An animal model for LRP5 deficiency may be constructed using standardtechniques for introducing mutations into an animal germ-line. In oneexample of this approach, using a mouse, a vector carrying aninsertional mutation within the LRP5 gene may be transfected intoembryonic stem cells. A selectable marker, for example an antibioticresistance gene such as neoR, may be included to facilitate selection ofclones in which the mutant gene has replaced the endogenous wild typehomologue. Such clones may be also be identified or further investigatedby Southern blot hybridisation. The clones may then be expanded andcells injected into mouse blastocyst stage embryos. Mice in which theinjected cells have contributed to the development of the mouse may beidentified by Southern blotting. These chimeric mice may then be bred toproduce mice which carry one copy of the mutation in the germ line.These heterozygous mutant animals may then be bred to produce micecarrying mutations in the gene homozygously. The mice having aheterozygous mutation in the LRP5 gene may be a suitable model for humanindividuals having one copy of the gene mutated in the germ line who areat risk of developing IDDM or other disease.

Animal models may also be useful for any of the various diseasesdiscussed elsewhere herein.

Instead of or as well as being used for the production of a polypeptideencoded by a transgene, host cells may be used as a nucleic acid factoryto replicate the nucleic acid of interest in order to generate largeamounts of it. Multiple copies of nucleic acid of interest may be madewithin a cell when coupled to an amplifiable gene such as dihyrofolatereductase (DHFR), as is well known. Host cells transformed with nucleicacid of interest, or which are descended from host cells into whichnucleic acid was introduced, may be cultured under suitable conditions,e.g. in a fermentor, taken from the culture and subjected to processingto purifiy the nucleic acid. Following purification, the nucleic acid orone or more fragments thereof may be used as desired, for instance in adiagnostic or prognostic assay as discussed elsewhere herein.

The provision of the novel LRP-5 polypeptide isoforms and mutants,alleles, variants and derivatives enables for the first time theproduction of antibodies able to bind these molecules specifically.

Accordingly, a further aspect of the present invention provides anantibody able to bind specifically to the polypeptide whose sequence isgiven in a figure herein. Such an antibody may be specific in the senseof being able to distinguish between the polypeptide it is able to bindand other human polypeptides for which it has no or substantially nobinding affinity (e.g. a binding affinity of about 1000× less). Specificantibodies bind an epitope on the molecule which is either not presentor is not accessible on other molecules. Antibodies according to thepresent invention may be specific for the wild-type polypeptide.Antibodies according to the invention may be specific for a particularmutant, variant, allele or derivative polypeptide as between thatmolecule and the wild-type polypeptide, so as to be useful in diagnosticand prognostic methods as discussed below. Antibodies are also useful inpurifying the polypeptide or polypeptides to which they bind, e.g.following production by recombinant expression from encoding nucleicacid.

Preferred antibodies according to the invention are isolated, in thesense of being free from contaminants such as antibodies able to bindother polypeptides and/or free of serum components. Monoclonalantibodies are preferred for some purposes, though polyclonal antibodiesare within the scope of the present invention.

Antibodies may be obtained using techniques which are standard in theart. Methods of producing antibodies include immunising a mammal (e.g.mouse, rat, rabbit, horse, goat, sheep or monkey) with the protein or afragment thereof. Antibodies may be obtained from immunised animalsusing any of a variety of techniques known in the art, and screened,preferably using binding of antibody to antigen of interest. Forinstance, Western blotting techniques or immunoprecipitation may be used(Armitage et al., 1992, Nature 357: 80-82). Isolation of antibodiesand/or antibody-producing cells from an animal may be accompanied by astep of sacrificing the animal.

As an alternative or supplement to immunising a mammal with a peptide,an antibody specific for a protein may be obtained from a recombinantlyproduced library of expressed immunoglobulin variable domains, e.g.using lambda bacteriophage or filamentous bacteriophage which displayfunctional immunoglobulin binding domains on their surfaces; forinstance see WO92/01047. The library may be naive, that is constructedfrom sequences obtained from an organism which has not been immunisedwith any of the proteins (or fragments), or may be one constructed usingsequences obtained from an organism which has been exposed to theantigen of interest.

Suitable peptides for use in immunising an animal and/or isolatinganti-LRP5 antibody include any of the following amino acid sequences:

SYFHLFPPPPSPCTDSS (SEQ ID NO:403)

VDGRQNIKRAKDDGT (SEQ ID NO:404)

EVLFTTGLIRPVALVVDN (SEQ ID NO:405)

IQGHLDFVMDILVFHS. (SEQ ID NO:406)

Antibodies according to the present invention may be modified in anumber of ways. Indeed the term “antibody” should be construed ascovering any binding substance having a binding domain with the requiredspecificity. Thus the invention covers antibody fragments, derivatives,functional equivalents and homologues of antibodies, including syntheticmolecules and molecules whose shape mimicks that of an antibody enablingit to bind an antigen or epitope.

Example antibody fragments, capable of binding an antigen or otherbinding partner are the Fab fragment consisting of the VL, VH, Cl andCH1 domains; the Fd fragment consisting of the VH and CH1 domains; theFv fragment consisting of the VL and VH domains of a single arm of anantibody; the dAb fragment which consists of a VH domain; isolated CDRregions and F(ab′)2 fragments, a bivalent fragment including two Fabfragments linked by a disulphide bridge at the hinge region. Singlechain Fv fragments are also included.

A hybridoma producing a monoclonal antibody according to the presentinvention may be subject to genetic mutation or other changes. It willfurther be understood by those skilled in the art that a monoclonalantibody can be subjected to the techniques of recombinant DNAtechnology to produce other antibodies or chimeric molecules whichretain the specificity of the original antibody. Such techniques mayinvolve introducing DNA encoding the immunoglobulin variable region, orthe complementarity determining regions (CDRs), of an antibody to theconstant regions, or constant regions plus framework regions, of adifferent immunoglobulin. See, for instance, EP184187A, GB 2188638A orEP-A-0239400. Cloning and expression of chimeric antibodies aredescribed in EP-A-0120694 and EP-A-0125023.

Hybridomas capable of producing antibody with desired bindingcharacteristics are within the scope of the present invention, as arehost cells, eukaryotic or prokaryotic, containing nucleic acid encodingantibodies (including antibody fragments) and capable of theirexpression. The invention also provides methods of production of theantibodies including growing a cell capable of producing the antibodyunder conditions in which the antibody is produced, and preferablysecreted.

The reactivities of antibodies on a sample may be determined by anyappropriate means. Tagging with individual reporter molecules is onepossibility. The reporter molecules may directly or indirectly generatedetectable, and preferably measurable, signals. The linkage of reportermolecules may be directly or indirectly, covalently, e.g. via a peptidebond or non-covalently. Linkage via a peptide bond may be as a result ofrecombinant expression of a gene fusion encoding antibody and reportermolecule.

One favoured mode is by covalent linkage of each antibody with anindividual fluorochrome, phosphor or laser dye with spectrally isolatedabsorption or emission characteristics. Suitable fluorochromes includefluorescein, rhodamine, phycoerythrin and Texas Red. Suitablechromogenic dyes include diaminobenzidine.

Other reporters include macromolecular colloidal particles orparticulate material such as latex beads that are coloured, magnetic orparamagnetic, and biologically or chemically active agents that candirectly or indirectly cause detectable signals to be visually observed,electronically detected or otherwise recorded. These molecules may beenzymes which catalyse reactions that develop or change colours or causechanges in electrical properties, for example. They may be molecularlyexcitable, such that electronic transitions between energy states resultin characteristic spectral absorptions or emissions. They may includechemical entities used in conjunction with biosensors. Biotin/avidin orbiotin/streptavidin and alkaline phosphatase detection systems may beemployed.

The mode of determining binding is not a feature of the presentinvention and those skilled in the art are able to choose a suitablemode according to their preference and general knowledge. Particularembodiments of antibodies according to the present invention includeantibodies able to bind and/or which bind specifically, e.g. with anaffinity of at least 10⁻⁷ M, to one of the following peptides:

SYFHLFPPPPSPCTDSS (SEQ ID NO:403)

VDGRQNIKRAKDDGT (SEQ ID NO:404)

EVLFTTGLIRPVALVVDN (SEQ ID NO:405)

IQGHLDFVMDILVFHS. (SEQ ID NO:406)

Antibodies according to the present invention may be used in screeningfor the presence of a polypeptide, for example in a test samplecontaining cells or cell lysate as discussed, and may be used inpurifying and/or isolating a polypeptide according to the presentinvention, for instance following production of the polypeptide byexpression from encoding nucleic acid therefor. Antibodies may modulatethe activity of the polypeptide to which they bind and so, if thatpolypeptide has a deleterious effect in an individual, may be useful ina therapeutic context (which may include prophylaxis).

An antibody may be provided in a kit, which may include instructions foruse of the antibody, e.g. in determining the presence of a particularsubstance in a test sample. One or more other reagents may be included,such as labelling molecules, buffer solutions, elutants and so on.Reagents may be provided within containers which protect them from theexternal environment, such as a sealed vial.

The identification of the LRP5 gene and indications of its associationwith IDDM and other diseases paves the way for aspects of the presentinvention to provide the use of materials and methods, such as aredisclosed and discussed above, for establishing the presence or absencein a test sample of an variant form of the gene, in particular an alleleor variant specifically associated with IDDM or other disease. This maybe for diagnosing a predisposition of an individual to IDDM or otherdisease. It may be for diagnosing IDDM of a patient with the disease asbeing associated with the IDDM4 gene.

This allows for planning of appropriate therapeutic and/or prophylactictreatment, permitting stream-lining of treatment by targeting those mostlikely to benefit.

A variant form of the gene may contain one or more insertions,deletions, substitutions and/or additions of one or more nucleotidescompared with the wild-type sequence (such as shown in Table 5 or Table6) which may or may not disrupt the gene function. Differences at thenucleic acid level are not necessarily reflected by a difference in theamino acid sequence of the encoded polypeptide. However, a mutation orother difference in a gene may result in a frame-shift or stop codon,which could seriously affect the nature of the polypeptide produced (ifany), or a point mutation or gross mutational change to the encodedpolypeptide, including insertion, deletion, substitution and/or additionof one or more amino acids or regions in the polypeptide. A mutation ina promoter sequence or other regulatory region may prevent or reduceexpression from the gene or affect the processing or stability of themRNA transcript. For instance, a sequence alteration may affectalternative splicing of mRNA. As discussed, various LRP5 isoformsresulting from alternative splicing are provided by the presentinvention.

There are various methods for determining the presence or absence in atest sample of a particular nucleic acid sequence, such as the sequenceshown in any figure herein, or a mutant, variant or allele thereof, e.g.including an alteration shown in Table 5 or Table 6.

Tests may be carried out on preparations containing genomic DNA, cDNAand/or mRNA. Testing cDNA or mRNA has the advantage of the complexity ofthe nucleic acid being reduced by the absence of intron sequences, butthe possible disadvantage of extra time and effort being required inmaking the preparations. RNA is more difficult to manipulate than DNAbecause of the wide-spread occurrence of RN'ases. Nucleic acid in a testsample may be sequenced and the sequence compared with the sequenceshown in any of the figures herein, to determine whether or not adifference is present. If so, the difference can be compared with knownsusceptibility alleles (e.g. as shown in Table 5 or Table 6) todetermine whether the test nucleic acid contains one or more of thevariations indicated, or the difference can be investigated forassociation with IDDM or other disease.

Since it will not generally be time- or labour-efficient to sequence allnucleic acid in a test sample or even the whole LRP5 gene, a specificamplification reaction such as PCR using one or more pairs of primersmay be employed to amplify the region of interest in the nucleic acid,for instance the LRP5 gene or a particular region in which polymorphismsassociated with IDDM or other disease susceptibility occur. Theamplified nucleic acid may then be sequenced as above, and/or tested inany other way to determine the presence or absence of a particularfeature. Nucleic acid for testing may be prepared from nucleic acidremoved from cells or in a library using a variety of other techniquessuch as restriction enzyme digest and electrophoresis.

Nucleic acid may be screened using a variant- or allele-specific probe.Such a probe corresponds in sequence to a region of the LRP5 gene, orits complement, containing a sequence alteration known to be associatedwith IDDM or other disease susceptibility. Under suitably stringentconditions, specific hybridisation of such a probe to test nucleic acidis indicative of the presence of the sequence alteration in the testnucleic acid. For efficient screening purposes, more than one probe maybe used on the same test sample.

Allele- or variant-specific oligonucleotides may similarly be used inPCR to specifically amplify particular sequences if present in a testsample. Assessment of whether a PCR band contains a gene variant may becarried out in a number of ways familiar to those skilled in the art.The PCR product may for instance be treated in a way that enables one todisplay the polymorphism on a denaturing polyacrylamide DNA sequencinggel, with specific bands that are linked to the gene variants beingselected.

SSCP heteroduplex analysis may be used for screening DNA fragments forsequence variants/mutations. It generally involves amplifyingradiolabelled 100-300 bp fragments of the gene, diluting these productsand denaturing at 95° C. The fragments are quick-cooled on ice so thatthe DNA remains in single stranded form. These single stranded fragmentsare run through acrylamide based gels. Differences in the sequencecomposition will cause the single stranded molecules to adopt differenceconformations in this gel matrix making their mobility different fromwild type fragments, thus allowing detecting of mutations in thefragments being analysed relative to a control fragment upon exposure ofthe gel to X-ray film. Fragments with altered mobility/conformations maybe directly excised from the gel and directly sequenced for mutation.

Sequencing of a PCR product may involve precipitation with isopropanol,resuspension and sequencing using a TaqFS+Dye terminator sequencing kit.Extension products may be electrophoresed on an ABI 377 DNA sequencerand data analysed using Sequence Navigator software.

A further possible screening approach employs a PTT assay in whichfragments are amplified with primers that contain the consensus Kozakinitiation sequences and a T7 RNA polymerase promoter. These extrasequences are incorporated into the 5′ primer such that they are inframe with the native coding sequence of the fragment being analysed.These PCR products are introduced into a coupledtranscription/translation system. This reaction allows the production ofRNA from the fragment and translation of this RNA into a proteinfragment. PCR products from controls make a protein product of a wildtype size relative to the size of the fragment being analysed. If thePCR product analysed has a frame-shift or nonsense mutation, the assaywill yield a truncated protein product relative to controls. The size ofthe truncated product is related to the position of the mutation, andthe relative region of the gene from this patient may be sequenced toidentify the truncating mutation.

An alternative or supplement to looking for the presence of variantsequences in a test sample is to look for the presence of the normalsequence, e.g. using a suitably specific oligonucleotide probe orprimer. Use of oligonucleotide probes and primers has been discussed inmore detail above.

Allele- or variant-specific oligonucleotide probes or primers accordingto embodiments of the present invention may be selected from those shownin Table 4 (SEQ ID NOS:83-317), Table 7 (SEQ ID NOS:240-317) or Table 8(SEQ ID NOS:318-333).

Approaches which rely on hybridisation between a probe and test nucleicacid and subsequent detection of a mismatch may be employed. Underappropriate conditions (temperature, pH etc.), an oligonucleotide probewill hybridise with a sequence which is not entirely complementary. Thedegree of base-pairing between the two molecules will be sufficient forthem to anneal despite a mis-match. Various approaches are well known inthe art for detecting the presence of a mis-match between two annealingnucleic acid molecules.

For instance, RN'ase A cleaves at the site of a mis-match. Cleavage canbe detected by electrophoresing test nucleic acid to which the relevantprobe or probe has annealed and looking for smaller molecules (i.e.molecules with higher electrophoretic mobility) than the full lengthprobe/test hybrid.

Thus, an oligonucleotide probe that has the sequence of a region of thenormal LRP5 gene (either sense or anti-sense strand) in which mutationsassociated with IDDM or other disease susceptibility are known to occur(e.g. see Table 5 and Table 6) may be annealed to test nucleic acid andthe presence or absence of a mis-match determined. Detection of thepresence of a mis-match may indicate the presence in the test nucleicacid of a mutation associated with IDDM or other disease susceptibility.On the other hand, an oligonucleotide probe that has the sequence of aregion of the gene including a mutation associated with IDDM or otherdisease susceptibility may be annealed to test nucleic acid and thepresence or absence of a mis-match determined. The presence of amis-match may indicate that the nucleic acid in the test sample has thenormal sequence (the absence of a mis-match indicating that the testnucleic acid has the mutation). In either case, a battery of probes todifferent regions of the gene may be employed.

The presence of differences in sequence of nucleic acid molecules may bedetected by means of restriction enzyme digestion, such as in a methodof DNA fingerprinting where the restriction pattern produced when one ormore restriction enzymes are used to cut a sample of nucleic acid iscompared with the pattern obtained when a sample containing the normalgene shown in a figure herein or a variant or allele, e.g. as containingan alteration shown in Table 5 or Table 6 is digested with the sameenzyme or enzymes.

The presence or absence of a lesion in a promoter or other regulatorysequence may also be assessed by determining the level of mRNAproduction by transcription or the level of polypeptide production bytranslation from the mRNA. Determination of promoter activity has beendiscussed above.

A test sample of nucleic acid may be provided for example by extractingnucleic acid from cells or biological tissues or fluids, urine, saliva,faeces, a buccal swab, biopsy or preferably blood, or for pre-nataltesting from the amnion, placenta or foetus itself.

There are various methods for determining the presence or absence in atest sample of a particular polypeptide, such as the polypeptide withthe amino acid sequence shown in any figure herein or an amino acidsequence mutant, variant or allele thereof.

A sample may be tested for the presence of a binding partner for aspecific binding member such as an antibody (or mixture of antibodies),specific for one or more particular variants of the polypeptide shown ina figure herein. A sample may be tested for the presence of a bindingpartner for a specific binding member such as an antibody (or mixture ofantibodies), specific for the polypeptide shown in a figure herein. Insuch cases, the sample may be tested by being contacted with a specificbinding member such as an antibody under appropriate conditions forspecific binding, before binding is determined, for instance using areporter system as discussed. Where a panel of antibodies is used,different reporting labels may be employed for each antibody so thatbinding of each can be determined.

A specific binding member such as an antibody may be used to isolateand/or purify its binding partner polypeptide from a test sample, toallow for sequence and/or biochemical analysis of the polypeptide todetermine whether it has the sequence and/or properties of thepolypeptide whose sequence is disclosed herein, or if it is a mutant orvariant form. Amino acid sequence is routine in the art using automatedsequencing machines.

A test sample containing one or more polypeptides may be provided forexample as a crude or partially purified cell or cell lysatepreparation, e.g. using tissues or cells, such as from saliva, faeces,or preferably blood, or for pre-natal testing from the amnion, placentaor foetus itself.

Whether it is a polypeptide, antibody, peptide, nucleic acid molecule,small molecule or other pharmaceutically useful compound according tothe present invention that is to be given to an individual,administration is preferably in a “prophylactically effective amount” ora “therapeutically effective amount” (as the case may be, althoughprophylaxis may be considered therapy), this being sufficient to showbenefit to the individual. The actual amount administered, and rate andtime-course of administration, will depend on the nature and severity ofwhat is being treated. Prescription of treatment, e.g. decisions ondosage etc, is within the responsibility of general practioners andother medical doctors.

A composition may be administered alone or in combination with othertreatments, either simultaneously or sequentially dependent upon thecondition to be treated.

Pharmaceutical compositions according to the present invention, and foruse in accordance with the present invention, may include, in additionto active ingredient, a pharmaceutically acceptable excipient, carrier,buffer, stabiliser or other materials well known to those skilled in theart. Such materials should be non-toxic and should not interfere withthe efficacy of the active ingredient. The precise nature of the carrieror other material will depend on the route of administration, which maybe oral, or by injection, e.g. cutaneous, subcutaneous or intravenous.

Pharmaceutical compositions for oral administration may be in tablet,capsule, powder or liquid form. A tablet may include a solid carriersuch as gelatin or an adjuvant. Liquid pharmaceutical compositionsgenerally include a liquid carrier such as water, petroleum, animal orvegetable oils, mineral oil or synthetic oil. Physiological salinesolution, dextrose or other saccharide solution or glycols such asethylene glycol, propylene glycol or polyethylene glycol may beincluded.

For intravenous, cutaneous or subcutaneous injection, or injection atthe site of affliction, the active ingredient will be in the form of aparenterally acceptable aqueous solution which is pyrogen-free and hassuitable pH, isotonicity and stability. Those of relevant skill in theart are well able to prepare suitable solutions using, for example,isotonic vehicles such as Sodium Chloride Injection, Ringer's Injection,or Lactated Ringer's Injection. Preservatives, stabilisers, buffers,antioxidants and/or other additives may be included, as required.

Targeting therapies may be used to deliver the active agent morespecifically to certain types of cell, by the use of targeting systemssuch as antibody or cell specific ligands. Targeting may be desirablefor a variety of reasons; for example if the agent is unacceptablytoxic, or if it would otherwise require too high a dosage, or if itwould not otherwise be able to enter the target cells.

Instead of administering an agent directly, it may be be produced intarget cells by expression from an encoding gene introduced into thecells, e.g. in a viral vector (see below). The vector may be targeted tothe specific cells to be treated, or it may contain regulatory elementswhich are switched on more or less selectively by the target cells Viralvectors may be targeted using specific binding molecules, such as asugar, glycolipid or protein such as an antibody or binding fragmentthereof. Nucleic acid may be targeted by means of linkage to a proteinligand (such as an antibody or binding fragment thereof) via polylysine,with the ligand being specific for a receptor present on the surface ofthe target cells.

An agent may be administered in a precursor form, for conversion to anactive form by an activating agent produced in, or targeted to, thecells to be treated. This type of approach is sometimes known as ADEPTor VDEPT; the former involving targeting the activating agent to thecells by conjugation to a cell-specific antibody, while the latterinvolves producing the activating agent, e.g. an enzyme, in a vector byexpression from encoding DNA in a viral vector (see for example,EP-A-415731 and WO 90/07936).

Nucleic acid according to the present invention, e.g. encoding theauthentic biologically active LRP-5 polypeptide or a functional fragmentthereof, may be used in a method of gene therapy, to treat a patient whois unable to synthesize the active polypeptide or unable to synthesizeit at the normal level, thereby providing the effect provided by thewild-type with the aim of treating and/or preventing one or moresymptoms of IDDM and/or one or more other diseases.

Vectors such as viral vectors have been used to introduce genes into awide variety of different target cells. Typically the vectors areexposed to the target cells so that transfection can take place in asufficient proportion of the cells to provide a useful therapeutic orprophylactic effect from the expression of the desired polypeptide. Thetransfected nucleic acid may be permanently incorporated into the genomeof each of the targeted cells, providing long lasting effect, oralternatively the treatment may have to be repeated periodically.

A variety of vectors, both viral vectors and plasmid vectors, are knownin the art, see e.g. U.S. Pat. No. 5,252,479 and WO 93/07282. Inparticular, a number of viruses have been used as gene transfer vectors,including adenovirus, papovaviruses, such as SV40, vaccinia virus,herpesviruses, including HSV and EBV, and retroviruses, including gibbonape leukaemia virus, Rous Sarcoma Virus, Venezualian equineenchephalitis virus, Moloney murine leukaemia virus and murine mammarytumourvirus. Many gene therapy protocols in the prior art have useddisabled murine retroviruses.

Disabled virus vectors are produced in helper cell lines. in which genesrequired for production of infectious viral particles are expressed.Helper cell lines are generally missing a sequence which is recognisedby the mechanism which packages the viral genome and produce virionswhich contain no nucleic acid. A viral vector which contains an intactpackaging signal along with the gene or other sequence to be delivered(e.g. encoding the LRP5 polypeptide or a fragment thereof) is packagedin the helper cells into infectious virion particles, which may then beused for the gene delivery.

Other known methods of introducing nucleic acid into cells includeelectroporation, calcium phosphate co-precipitation, mechanicaltechniques such as microinjection, transfer mediated by liposomes anddirect DNA uptake and receptor-mediated DNA transfer. Liposomes canencapsulate RNA, DNA and virions for delivery to cells. Depending onfactors such as pH, ionic strength and divalent cations being present,the composition of liposomes may be tailored for targeting of particularcells or tissues. Liposomes include phospholipids and may include lipidsand steroids and the composition of each such component may be altered.Targeting of liposomes may also be achieved using a specific bindingpair member such as an antibody or binding fragment thereof, a sugar ora glycolipid.

The aim of gene therapy using nucleic acid encoding the polypeptide, oran active portion thereof, is to increase the amount of the expressionproduct of the nucleic acid in cells in which the level of the wild-typepolypeptide is absent or present only at reduced levels. Such treatmentmay be therapeutic or prophylactic, particularly in the treatment ofindividuals known through screening or testing to have an IDDM4susceptibility allele and hence a predisposition to the disease.

Similar techiques may be used for anti-sense regulation of geneexpression, e.g. targeting an antisense nucleic acid molecule to cellsin which a mutant form of the gene is expressed, the aim being to reduceproduction of the mutant gene product. Other approaches to specificdown-regulation of genes are well known, including the use of ribozymesdesigned to cleave specific nucleic acid sequences. Ribozymes are nuceicacid molecules, actually RNA, which specifically cleave single-strandedRNA, such as mRNA, at defined sequences, and their specificity can beengineered. Hammerhead ribozymes may be preferred because they recognisebase sequences of about 11-18 bases in length, and so have greaterspecificity than ribozymes of the Tetrahymena type which recognisesequences of about 4 bases in length, though the latter type ofribozymes are useful in certain circumstances. References on the use ofribozymes include Marschall, et al. Cellular and Molecular Neurobiology,1994. 14(5): 523; Hasselhoff, Nature 334: 585 (1988) and Cech, J. Amer.Med. Assn., 260: 3030 (1988).

Aspects of the present invention will now be illustrated with referenceto the accompanying figures described already above and experimentalexemplification, by way of example and not limitation. Further aspectsand embodiments will be apparent to those of ordinary skill in the art.All documents mentioned in this specification are hereby incorporatedherein by reference.

EXAMPLE 1 Cloning of LRP5

As noted above, confirmation of linkage to two of the 18 potential locifor IDDM predisposition was achieved by analysis of two family sets (102UK families and 84 USA families), IDDM4 on chromosome 11q13 (MLS 1.3,P=0.01 at FGF3) and IDDM5 on chromosome 6q (MLS 1.8 P=0.003 at ESR). AtIDDM4 the most significant linkage was obtained in the subset offamilies sharing 1 or 0 alleles IBD at HLA (MLSr=2.8; P=0.0002; ls=1.2)(Davies et al, 1994). This linkage was also observed by Hashimoto et al(1994) using 251 affected sibpairs, obtaining P=0.0008 in all sibpairs.Combining these results, with 596 families, provides substantial supportfor IDDM4 (P=1.5×10⁻⁶) (Todd and Farrall, 1996; Luo et al, 1996).

Multipoint analysis with other markers in the FGF3 region produced anMLS of 2.3 at FGF3 and D11S1883 (ls=1.19), and delineated the intervalto a 27 cM region, flanked by the markers D11S903 and D11S527 (FIG. 1).

Multipoint linkage analysis cannot localise the gene to a small regionunless several thousand multiplex families are available. Instead,association mapping has been used for rare single gene diseases whichcan narrow the interval containing the disease gene to less than 2 cM or2M bases. Nevertheless, this method is highly unpredictable and has notpreviously been used to locate a polygene for a common disease.Association mapping has been used to locate the IDDM2/INS polygene butthis relied on the selection of a functional candidate polymorphism/geneand was restricted to a very small (<30 kb) region. Linkagedisequilibrium (LD) or association studies were carried out in order todelineate the IDDM4 region to less than 2 cM. In theory, association ofa particular allele very close to the founder mutation will be detectedin populations descended from that founder. The transmissiondisequilibrium test (TDT, Spielman et al, 1993) measures association byassessing the deviation from 50% of the transmission of alleles from amarker locus from parents to affected children. The detection ofassociation is dependent on the ancestry of each population studied tobe as homogeneous as possible, in order to reduce the possiblity thatthe presence of several founder-chromosomes, decreasing the power todetect the association. These parameters are highly unpredictable.

Analysis of markers spanning the IDDM4 linkage interval, LD was detectedat D11S1917(UT5620) in 554 families, P=0.01. A physical map of thisregion, comprising approximately 500 kb, was achieved by constructing apac, bac and cosmid contig (FIG. 2). The region was physically mapped byhybridisation of markers onto restriction-enzyme digested clonesresolved through agarose, and Southern blotted.

Further microsatellites (both published, and those isolated from theclones by microsatellite rescue) were analysed within 1289 families,from four different populations (UK, USA, Sardinia and Norway). A LDgraph was constructed, with a peak at H0570POLYA, P=0.001, flanked bythe markers D11S987 and 18018AC (FIG. 3). The LD detected at apolymorphic marker is influenced by allele frequency, and whether themutation causing susceptibility to type 1 diabetes arose on a chromosomewhere the allele in LD is the same allele as that on protective orneutral chromosomes. In the case where the marker being analysed has thesame allele in LD with both susceptible and protective genotypes, thesewill remain undetected by single point analysis, in effect cancellingeach other out, and showing little or no evidence for LD with thedisease locus. Unpredictability of the method arising from this has beennoted already above.

In order to maximise the information obtained with each marker, a threepoint rolling LD curve was produced with the IDDM4 markers (FIG. 4). Inthis case the percentage transmission (IT) was calculated from a marker,and its two immediate flanking markers, and averaged between them tominimise the effects of fluctuating allele frequency. This also produceda peak at H0570POLYA, with P=0.04, and indicates that the IDDM4 mutationis more likely to be in the interval EB0864CA-D11S1337 (75 kb).

By the identification of this 75 kb interval which shows associationwith type 1 diabetes, disease associated haplotypes were identified.These are derived from the original founder chromosomes on which thediabetes mutation or mutations IDDM4 arose. In order to identify themutation causing susceptibility to type 1 diabetes, a refined linkagedisequilibrium curve, based on single nucleotide polymorphisms (SNPs)and haplotypes, is constructed. SNPs are identified by sequencingindividuals with specific haplotypes which have been identified from themicrosatellite analysis: homozygous susceptible to type 1 diabetes,homozygous protective for type 1 diabetes, and controls. One of theseSNPs may be the etiological mutation IDDM4, or may be in very stronglinkage disequilibrium with the primary disease locus, and hence be at apeak of the refined curve. Cross-match analysis further reduces thenumber of candidate SNPs, as shown by the localisation of the IDDM2mutation by this method (Bennett et al, 1995; Bennett and Todd, 1996).This requires identification of distinct haplotypes or founderchromosomes, which have a different arrangement of alleles from the mainsusceptible or protective haplotypes, so that association ortransmission of candidate SNP alleles can be tested in differenthaplotype backgrounds. The candidate mutations can be assessed foreffects on gene function or regulation.

In different populations different IDDM4 mutations may have arisen inthe same gene. We are sequencing several putative founder chromosome ordisease associated haplotypes from several unrelated individuals fromdifferent populations to identify candidate mutations for IDDM4, andwhich cluster in the same gene.

To carry out an extensive search for DNA mutations or polymorphisms, theentire region and flanking regions of the associated region wassequenced (the 75 kb core region and 125 kb of flanking DNA). The DNAsequence also aids in gene identification and is complementary to othermethods of gene identification such as cDNA selection or geneidentification by DNA sequencing and comparative analysis of homologousmouse genomic DNA.

Various strategies were used in the hope of identifying potential codingsequences within this region: sequencing, computer prediction ofputative exons and promoters, and cDNA selection, to try to increase thelikelihood of identifying all the genes within this interval.

Construction of Libraries for Shotgun Sequencing

DNA was prepared from either cosmids, BACs (Bacterial ArtificialChromosomes), or PACs (P1 Artificial Chromosomes). Cells containing thevector were streaked on Luria-Bertani (LB) agar plates supplemented withthe appropriate antibiotic. A single colony was used to inoculate 200 mlof LB media supplemented with the appropriate antibiotic and grownovernight at 37° C. The cells were pelleted by centrifugation andplasmid DNA was prepared by following the QIAGEN (Chatsworth, Calif.)Tip500 Maxi plasmid/cosmid purification protocol with the followingmodifications; the cells from 100 ml of culture were used for eachTip500 column, the NaCl concentration of the elution buffer wasincreased from 1.25M to 1.7M, and the elution buffer was heated to 65°C.

Purified BAC and PAC DNA was digested with Not I restrictionendonuclease and then subjected to pulse field gel electrophoresis usinga BioRad CHEF Mapper system. (Richmond, Calif.). The digested DNA waselectrophoresed overnight in a lo low melting temperature agarose(BioRad, Richmond Calif.) gel that was prepared with 0.5×Tris BorateEDTA (10×stock solution, Fisher, Pittsburg, Pa.). The CHEF Mapperautoalgorithm default settings were used for switching times andvoltages. Following electrophoresis the gel was stained with ethidiumbromide (Sigma, St. Louis, Mo.) and visualized with a ultraviolettransilluminator. The insert band(s) was excised from the gel. The DNAwas eluted from the gel slice by beta-Agarase (New England Biolabs,Beverly Mass.) digestion according to the manufacturer's instructions.The solution containing the DNA and digested agarose was brought to 50mM Tris pH 8.0, 15 mM MgCl2, and 25% glycerol in a volume of 2 ml andplaced in a AERO-MIST nebulizer (CIS-US, Bedford Mass.). The nebulizerwas attatched to a nitrogen gas source and the DNA was randomly shearedat 10 psi for 30 sec. The sheared DNA was ethanol precipitated andresuspended in TE (10 mM Tris, 1 mM EDTA). The ends were made blunt bytreatment with Mung Bean Nuclease (Promega, Madison, Wis.) at 30° C. for30 min, followed by phenol/chloroform extraction, and treatment with T4DNA polymerase (GIBCO/BRL, Gaithersburg, Md.) in multicore buffer(Promega, Madison, Wis.) in the presence of 40 uM dNTPs at 16° C. Tofacilitate subcloning of the DNA fragments, BstX I adapters (Invitrogen,Carlsbad, Calif.) were ligated to the fragments at 14° C. overnight withT4 DNA ligase (Promega, Madison Wis.). Adapters and DNA fragments lessthan 500 bp were removed by column chromatography using a cDNA sizingcolumn (GIBCO/BRL, Gaithersburg, Md.) according to the instructionsprovided by the manufacturer. Fractions containing DNA greater than 1 kbwere pooled and concentrated by ethanol precipitation. The DNA fragmentscontaining BstX I adapters were ligated into the BstX I sites of pSHOTII which was constructed by subcloning the BstX I sites from pcDNA II(Invitrogen, Carlsbad, Calif.) into the BssH II sites of pBlueScript(Stratagene, La Jolla, Calif.). pSHOT II was prepared by digestion withBstX I restriction endonuclease and purified by agarose gelelectrophoresis. The gel purified vector DNA was extracted from theagarose by following the Prep-A-Gene (BioRad, Richmond, Calif.)protocol. To reduce ligation of the vector to itself, the digestedvector was treated with calf intestinal phosphatase (GIBCO/BRL,Gaithersburg, Md.). Ligation reactions of the DNA fragments with thecloning vector were transformed into ultra-competent XL-2 Blue cells(Stratagene, La Jolla, Calif.), and plated on LB agar platessupplemented with 100 ug/ml ampicillin. Individual colonies were pickedinto a 96 well plate containing 100 ul/well of LB broth supplementedwith ampicillin and grown overnight at 37° C. Approximately 25 ul of 80%sterile glycerol was added to each well and the cultures stored at −80°C.

Preparation of Plasmid DNA

Glycerol stocks were used to inoculate 5 ml of LB broth supplementedwith 100 ug/ml ampicillin either manually or by using a Tecan GenesisRSP 150 robot (Tecan AG, Hombrechtikon, Switzerland) programmed toinoculate 96 tubes containing 5 ml broth from the 96 wells. The cultureswere grown overnight at 37° C. with shaking to provide aeration.Bacterial cells were pelleted by centrifugation, the supernatantdecanted, and the cell pellet stored at −20° C. Plasmid DNA was preparedwith a QIAGEN Bio Robot 9600 (QIAGEN, Chatsworth Calif.) according tothe Qiawell Ultra protocol. To test the frequency and size of insertsplasmid DNA was digested with the restriction endonuclease Pvu II. Thesize of the restriction endonuclease products was examined by agarosegel electrophoresis with the average insert size being 1 to 2 kb.

DNA Sequence Analysis of Shotgun Clones

DNA sequence analysis was performed using the ABI PRISM™ dye terminatorcycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS(Perkin Elmer, Norwalk, Conn.). DNA sequence analysis was performed withM13 forward and reverse primers. Following amplification in aPerkin-Elmer 9600 the extension products were purified and analyzed onan ABI PRISM 377 automated sequencer (Perkin Elmer, Norwalk, Conn.).Approximately 12 to 15 sequencing reactions were performed per kb of DNAto be examined e.g. 1500 reactions would be performed for a PAC insertof 100 kb.

Assembly of DNA Sequences

Phred/Phrap was used for DNA sequences assembly. This program wasdeveloped by Dr. Phil Green and licensed from the University ofWashington (Seattle, Wash.). Phred/Phrap consists of the followingprograms: Phred for base-calling, Phrap for sequence assembly,Crossmatch for sequence comparisons, Consed and Phrapview forvisualization of data, and Repeatmasker for screening repetitivesequences. Vector and E. coli DNA sequences were identified byCrossmatch and removed from the DNA sequence assembly process. DNAsequence assembly was on a SUN Enterprise 4000 server running Solaris2.51 operating system (Sun Microsystems Inc., Mountain View, Calif.)using default Phrap parameters. The sequence assemblies were furtheranalyzed using Consed and Phrapview.

BioInformatic Analysis of Assembled DNA Sequences

When the assembled DNA sequences approached five to six fold coverage ofthe region of interest the exon and promoter prediction abilities of theprogram GRAIL (ApoCom, Oak Ridge) were utilized to aid in geneidentification. ApoCom GRAIL is a commercial version of the Departmentof Energy developed GRAIL Gene Characterization Software licensed toApoCom Inc. by Lockheed Martin Energy Research Corporation and ApoComClient Tool for Genomics (ACTG)™.

The DNA sequences at various stages of assembly were queried against theDNA sequences in the GenBank database (subject) using the BLASTalgorithm (S. F. Altschul, et al. (1990) J. Mol. Biol. 215, 403-410),with default parameters. When examining large contiguous sequences ofDNA repetitive elements were masked following identification bycrossmatch with a database of mammalian repetitive elements. FollowingBLAST analysis the results were compiled by a parser program written byDr. Guochun Xie (Merck Research Lab). The parser provided the followinginformation from the database for each DNA sequence having a similaritywith a P value greater than 10⁻⁶; the annotated name of the sequence,the database from which it was derived, the length and percent identityof the region of similarity, and the location of the similarity in boththe query and the subject.

The BLAST analysis identified a high degree of similarities (90-100%identical) over a length of greater than 100 bp between DNA sequences weobtained and a number of human EST sequences present in the database.These human EST sequences clustered into groups that are represented byaccession numbers; R73322, R50627, F07016. In general, each EST clusteris presumed to represent a single gene. The DNA sequences in R73322cluster of 424 nucleotides had a lower but significant degree of DNAsequence similarity to the gene encoding the LDL receptor relatedprotein (GenBank accession number X13916) and several other members ofthe LDL receptor family. Therefore it was concluded that the sequencesthat were highly similar to EST R73322 encoded a member of the LDLreceptor family.

Members of each EST cluster were assembled using the program Sequencher(Perkin Elmer, Norwalk Conn.). To increase the accuracy of the ESTsequence data extracted from the database relevent chromatogram tracefiles from the genomic DNA sequences obtained from shotgun sequencingwere included in the assembly. The corrected EST sequences werereanalyzed by BLAST and BLASTX. For EST cluster 3, represented byaccession number R50627 analysis of the edited EST assembly revealedthat this cluster was similar to members of the LDL receptor family.This result suggested the possibility that these two EST clusters werecomponents of the same gene.

Experimentally derived cDNA sequences were assembled using the programSequencher (Perkin Elmer, Norwalk Conn.). Genomic DNA sequences and cDNAsequences were compared by using the program Crossmatch which allowedfor a rapid and sensitive detection of the location of exons. Theidentification of intron/exon boundaries was then accomplished bymanually comparing the genomic and cDNA sequences by using the programGeneWorks (Intelligenetics Inc., Campbell Calif.).

Northern Blot Analysis

Primers 256F and 622R ((SEQ ID NOS:51,52) Table 2) were used to amplifya PCR product of 366 bp from a fetal brain cDNA library. This productwas purified on an agarose gel, the DNA extracted, and subcloned intopCR2.1 (Invitrogen, Carlsbad, Calif.). The 366 bp probe was labeled byrandom priming with the Amersham Rediprime kit (Arlington Heights, Ill.)in the presence of 50-100 uCi of 3000 Ci/mmole [alpha ³²P]dCTP(Dupont/NEN, Boston, Mass.). Unincorporated nucleotides were removedwith a ProbeQuant G-50 spin column (Pharmacia/tech, Biotech, Piscataway,N.J.). The radiolabeled probe at a concentration of greater than 1×10⁶cpm/ml in rapid hybridization buffer (Clontech, Palo Alto, Calif.) wasincubated overnight at 65° C. with human multiple tissue Northern's Iand II (Clontech, Palo Alto, Calif.). The blots were washed by two 15min incubations in 2×SSC, 0.1% SDS (prepared from 20×SSC and 20% SDSstock solutions, Fisher, Pittsburg, Pa.) at room temperature, followedby two 15 min incubations in 1×SSC, 0.1% SDS at room temperature, andtwo 30,min incubations in 0.1×SSC, 0.1% SDS at 60° C. Autoradiography ofthe blots was done to visualize the bands that specifically hybridizedto the radiolabeled probe.

The probe hybridized to an approximately 5-5.5 kb mRNA transcript thatis most highly expressed in placenta, liver, pancreas, and prostate. Itis expressed at an intermediate level in lung, skeletal muscle, kidney,spleen, thymus, ovary, small intestine, and colon. The message isexpressed at a low level in brain, testis, and leukocytes. In tissueswhere the transcript is highly expressed, e.g. liver and pancreas,additional bands of 7 kb and 1.3 kb are observed.

Isolation of Full Length cDNAs

PCR based techniques were used to extend regions that were highlysimilar to ESTs and regions identified by exon prediction software(GRAIL). The one technique utilized is a variation on RapidAmplification of cDNA Ends (RACE) termed Reduced Complexity cDNAAnalysis (RCCA) similar procedures are reported by Munroe et. al. (1995)PNAS 92: 2209-2213 and Wilfinger et. al. (1997) BioTechniques 22:481-486. This technique relies upon a PCR template that is a pool ofapproximately 20,000 cDNA clones, this reduces the complexity of thetemplate and increases the probability of obtaining longer PCRextensions. A second technique that was used to extend cDNAs was PCRbetween regions that were identified in the genomic sequence of havingthe potential to be portions of a gene e.g. sequences that were verysimilar to ESTs or sequences that were identified by GRAIL. These PCRreactions were done on cDNA prepared from approximately 5 ug of mRNA(Clontech, Palo Alto, Calif.) with the SuperScript™ choice system(Gibco/BRL, Gaithersburg, Md.). The first strand cDNA synthesis wasprimed using 1 ug of oligo(dT):₂₋₁₈ primer and 25 ng of random hexamersper reaction. Second strand cDNA synthesis was performed according tothe manufacturer's instructions.

Identification of Additional Exons Related to EST Cluster 1

We scanned 96 wells of a human fetal brain plasmid library, 20,000clones per well, by amplifying a 366 bp PCR product using primers 256Fand 622R. The reaction mix consisted of 4 ul of plasmid DNA (0.2 ng/ml),10 mM Tris-HCl pH 8.3, 50 mM KCl, 10% sucrose, 2.5 mM MgCl₂, 0.1%Tetrazine, 200 mM dNTP's, 100 ng of each primer and 0.1 ul of Taq Gold(Perkin-Elmer, Norwalk, Conn.). A total reaction volume of 11 ul wasincubated at 95° C. for 12 min followed by 32 cycles of 95° C. for 30sec, 60° C., for 30 sec and 72° C. for 30 sec. Approximately 20 wellswere found to contain the correct 366 bp fragment by PCR analysis. 5′and 3′ RACE was subsequently performed on several of the positive wellscontaining the plasmid cDNA library using a vector specific primer and agene specific primer. The vector specific primers, PBS 543R and PBS 873Fwere both used in combination with gene specific primers 117F and 518Rbecause the orientation of the insert was not known. PCR amplificationconditions consisted of 1×TaKaRa Buffer LA, 2.5 mM MgCl₂, 500 mM dNTP's,0.2 ul of TaKaRa LA Taq Polymerase (PanVera, Madison Wis.), 100 ng ofeach primer and 5 ul of the plasmid library at 0.2 ng/ml. In a totalreaction volume of 20 ml, the thermal cycling conditions were asfollows: 92° C. for 30 sec, followed by 32 cycles of 92° C. for 30 sec,1 min at 60° C. and 10 min at 68° C. After the initial PCRamplification, a nested or semi-nested PCR reaction was performed usingnested vector primers PBS 578R and PBS 838F and various gene specificprimers (256F, 343F, 623R and 657R). The PCR products were separatedfrom the unincorporated dNTP's and primers using QIAGEN, QIAquick PCRpurification spin columns using standard protocols and resuspended in 30ul of water. The amplification conditions for the nested and semi-nestedPCR were the same as the initial PCR amplification except that 3 ul ofthe purified PCR fragment was used as template and that the cyclingconditions were for only 20 cycles. Products obtained from this PCRamplification were analyzed on 1% agarose gels, excised fragments werepurified using QIAGEN QIAquick spin columns and sequenced using ABIdye-terminator sequencing kits. The products were analyzed on ABI 377sequencers according to standard protocols.

Connection of EST Clusters 1-3

As discussed above it is possible that each EST cluster represents asingle gene, alternatively the EST clusters may be portions of the samegene. To distinguish between these two possiblities, primers weredesigned to the two other EST clusters in the region represented by ESTaccession numbers F07016 (cluster 2, containing 272 nucleotides) andR50627 (cluster 3, containing 1177 nucleotides). Primers from cluster 1(117F and 499F) were paired with a primer from EST cluster 3 (4034R) ina PCR reaction. A 50 ul reaction was performed using the Takara LA Taqpolymerase (Panvera, Madison, Wis.) in the reaction buffer supplied bythe manufacturer with the addition of 0.32 mM dNTPs, primers, andapproximately 30 ng of lymph node cDNA. PCR products were amplified for35 cycles of 94° C. for 30 sec, 60° C. for 30 sec, and 72° C. for 4minutes. Products were electrophoresed on a 1% agarose gel and bands of2.5 to 3 kb were excised, subcloned into pCR 2.1 (Invitrogen, Carlsbad,Calif.), and plasmid DNA was prepared for DNA sequence analysis.

The primary reaction described above generated by a primer in ESTcluster 1 (638F) and EST cluster 3 (4173R) was utilized as the templatefor a reaction with a primer from EST cluster 1 (638F) and from ESTcluster 2 (3556R). This semi-nested PCR reaction was performed withTakara LA Taq polymerase as described in the previous paragraph. Anapproximately 2 kb product was generated and subcloned for DNA sequenceanalysis. The assembly of the DNA sequence results of these PCR productsindicated that EST clusters 1 to 3 were part of the same gene andestablished their orientation relative to each other in the mRNAtranscript produced by this gene.

PCR reactions were also performed between EST clusters 2 and 3.Amplification from liver cDNA using Takara LA Taq polymerase (Panvera,Madison, Wis.) with the primers 2519F, 3011F, or 3154F (EST cluster 2)in combination with 5061R (EST cluster 3) was done for 35 cyles of 95°C. for 30 sec, 60° C. for 60 sec, and 72° C. for 3 minutes. The PCRproducts were gel purified, subcloned, and the DNA sequence wasdetermined. The DNA sequence analysis of the ends of all these PCRproducts resulted in most of the cDNA sequence however to provide forcomplete DNA sequence of both strands oligonucleotide primers weredesigned and used for DNA sequencing (FIG. 5(a) (SEQ ID NO: 1)).

Extension of the 5′ End

RCCA analysis was utilized to obtain a number of clones extended 5′ byusing the internal gene specific primers as described previously.Several clonal extensions were isolated however most of the clonesanalyzed stopped within exon A. One clone extended past the 5′ end ofexon A but the sequence was contiguous with genomic DNA, since a body ofevidence indicates an intron/exon boundary at the 5′ end of exon A itappeared likely that this extension is a result of unprocessed intronicsequence. A second clone h10 extended past this point but diverged fromthe genomic DNA sequence. It was concluded that this represented achimeric clone that was present in the original fetal brain cDNAlibrary.

Identification of 5′ end of isoform 1

As described above results from RCCA experiments yielded a number ofindependent clones that terminated at the 5′ end of exon A. Thissuggested that the human LRP5 gene contains a region that the reversetranscriptase has difficulty transcribing. To circumvent this problem wedecided to isolate the mouse ortholog of LRP5, since subtle differencesin DNA sequence content can alter the ability of an enzyme to transcribea region. To increase the probability of isolating the 5′ portion of themouse gene a human probe of 366 nucleotides, described above and derivedfrom exons A and B was used.

A cDNA library was constructed from mouse liver mRNA purchased fromClontech (Palo Alto, Calif.). cDNA was prepared using the SuperScriptChoice system (Gibco/BRL Gaithersburg, Md.) according to themanufacturer's instructions. Phosphorylated Bst XI adapters (Invitrogen,San Diego, Calif.) were ligated to approximately 2 ug of mouse livercDNA. The ligation mix was diluted and size-fractionated on a cDNAsizing column (Gibco/BRL Gaithersburg, Md.). Drops from the column werecollected and the eluted volume from the column determined as describedfor the construction of shotgun libraries. The size-fractionated cDNAwith the Bst XI linkers was ligated into the vector pSHOT II, describedabove, cut with the restriction endonuclease Bst XI, gel purified, anddephosphorylated with calf intestinal phosphatase (Gibco/BRL,Gaithersburg, Md.). The ligation containing approximately 10-20 ng ofcDNA and approximately 100 ng of vector was incubated overnight at 14°C. The ligation was transformed into XL-2 Blue Ultracompetent cells(Stratagene, La Jolla, Calif.). The transformed cells were spread ontwenty 133 mm Colony/Plaque Screen filters (Dupont/NEN, Boston, Mass.)at a density of approximately 30,000 colonies per plate on Luria Brothagar plates supplemented with 100 ug/ml ampicillin (Sigma, St. Louis,Mo.). The colonies were grown overnight and then replica plated onto twoduplicate filters. The replica filters were grown for several hours at37° C. until the colonies were visible and processed for in situhybridization of colonies according to established procedures (Maniatis,Fritsch and Sambrook, 1982). A Stratalinker (Stratagene, La Jolla,Calif.) was used to crosslink the DNA to the filter. The filters werehybridized overnight with greater than 1,000,000 cpm/ml probe in1×hybridization buffer (Gibco/BRL, Gaithersburg, Md.) containing 50%formamide at 42° C. The probe was generated from a PCR product derivedfrom the human LRP5 cDNA using primers 512F and 878R. This probe wasrandom prime labeled with the Amersham Rediprime kit (Arlington Heights,Ill.) in the presence of 50-100 uCi of 3000 Ci/mmole [alpha 32P]dCTP(Dupont/NEN, Boston, Mass.) and purified using a ProbeQuant G-50 spincolumn (Pharmacia/Biotech, Piscataway, N.J.). The filters were washedwith 0.1×SSC, 0.1% SDS at 42° C. Following autoradiography individualregions containing hybridization positive colonies were excised from themaster filter and placed into 0.5 ml Luria Broth plus 20% glycerol. Eachpositive was replated at a density of approximate 50-200 colonies per100 mm plate and screened by hybridization as described above. Singlecolonies were isolated and plasmid DNA was prepared for DNA sequenceanalysis.

Three clones were isolated from the mouse cDNA library the assembledsequence of the clones (FIG. 16(a) (SEQ ID NO:35)) that had a highdegree of similarity (87% identical over an approximately 1700nucleotide portion) with the human LRP5 gene and thus likely representthe mouse ortholog of LRP5. The 500 amino acid of the portion of themouse LRP5 (FIG. 16(d) (SEQ ID NO:8)) that we initially obtained is 96%identical to human LRP5. Significantly two of these clones had sequencethat was 5′ of the region corresponding to exon A, clone 19a containedan additional 200 bp and clone 9a contained an additional 180 bp (FIG.16(b) (SEQ ID NO:36)). The additional 200 bp contains an open readingframe that begins at bp 112 (FIG. 16(c) (SEQ ID NO:37)). The initiatingcodon has consensus nucleotides for efficient initiation of translationat both the −3 (purine) and +4 (G nucleotide) positions (Kozak, M. 1996,Mamalian Genome 7:563-574). This open reading frame encodes a peptidewith the potential to act as a eukaryotic signal sequence for proteinexport (von Heijne, 1994, Ann. Rev. Biophys. Biomol. Struc. 23:167-192).The highest score for the signal sequence as determined by using theSigCleave program in the GCG analysis package (Genetics Computer Group,Madison Wis.) generates a mature peptide beginning at residue 29 ofisoform 1. Additional sites that may be utilized produce mature peptidesbeginning at amino acid residue 31 (the first amino acid encoded by exonA) or amino acid residues 32, 33, or 38.

Molecular Cloning of the Full Length Mouse Lrp3 cDNA

The mouse cDNA clones isolated by nucleic acid hybridization contain 1.7Kb of the 5′ end of the Lrp3 cDNA (FIG. 16(a) (SEQ ID NO:35)). Thisaccounts for approximately one-third of the full length cDNA whencompared to the human cDNA sequence. The remainder of the mouse Lrp3cDNA was isolated using PCR to amplify products from mouse liver cDNA.PCR primers, Table 9 (SEQ ID NOS:49-74,334-402), were designed basedupon DNA sequences identified by the sequence skimming of mouse genomicclones, BACs 53-d-8 and 131-p-15, which contain the mouse Lrp3 gene. BAC53-d-8 was mapped by FISH analysis to mouse chromosome 19 which issyntenic with 11q13. Sequence skimming of these clones identified DNAsequences that corresponded to the coding region of human LRP5 as wellas the 3′ untranslated region. This strategy resulted in thedetermination of a mouse cDNA sequence of 5059 nucleotides (FIG. 18(a)(SEQ ID NO:40)) which contains an open reading frame of 4842 nucleotides(FIG. 18(b) (SEQ ID NO:41)) that encodes a protein of 1614 amino acids(FIG. 18(c) (SEQ ID NO:42)). The putative ATG is in a sequence contextfavorable for initiation of translation (Kozak, M. 1996, Mamalian Genome7:563-574).

Comparison of Human and Mouse LRP5

The cDNA sequences of human and mouse LRP5 display 87% identity. Theopen reading frame of the human LRP5 cDNA encodes a protein of 1615amino acids (SEQ ID NO:3) that is 94% identical to the 1614 amino acidprotein encoded by mouse Lrp3 (SEQ ID NOS:42) (FIG. 18(d)). Thedifference in length is due to a single amino acid deletion in the mouseLrp3 signal peptide sequence. The signal peptide sequence is not highlyconserved being less than 50% identical between human and mouse. Thelocation of the putative signal sequence cleavage site is at amino acidresidue 25 in the human and amino acid 29 in the mouse. Cleavage atthese sites would result in mature human and mouse proteins of 1591 and1586 amino acids, respectively, which are 95% identical (FIG. 18(e) (SEQID NOS:43,44)). The high degree of overall sequence similarity arguesstrongly that the identified sequences are orthologs of the LRP5 gene.This hypothesis is further supported by the results of genomic Southernexperiments (data not shown).

Identification of Human Signal Peptide Exon for Isoform 1

The human exon encoding a signal peptide was isolated from liver cDNA byPCR. The forward primer 1F (SEQ ID NO:51) (Table 9) was used incombination with one of the following reverse primers: 218R, 265R, 318R,and 361R (SEQ ID NOS:50,52,53,54) in a PCR reaction using Taq Goldpolymerase (Perkin-Elmer, Norwalk, Conn.) and supplemented with either3, 5, or 7% DMSO. Products were amplified for 40 cycles of 30 sec 95°C., 30 sec 58° C., and 1 min 72° C. The products were analyzed on anagarose gel and some of the reactions containing bands of the predictedsize were selected for DNA sequence analysis and subcloning into pCR2.1(Invitrogen, San Diego, Calif.).

The derived DNA sequence of 139 nucleotides upstream of exon 2 (alsoknown as exon A) contains an ATG that is in a context for efficientinitiation of translation: an adenine (A) residue at the −3 position anda guanine (G) residue at the +4 position (Kozak, M. 1996, MamalianGenome 7:563-574). The open reading frame for this ATG continues for4854 nucleotides (FIG. 5(b)) (SEQ ID NO:2) which encodes a polypeptideof 1615 amino acids (FIG. 5(c) (SEQ ID NO:3)).

The sequence following the initiator ATG codon encodes a peptide withthe potential to act as a signal for protein export. The highest scorefor the signal sequence (15.3) indicated by the SigCleave program in theGCG analysis package (Genetics Computer Group, Madison Wis.) generates amature polypeptide beginning at amino acid residue 25 (FIG. 5(d,e).Additional putative cleavage sites that may be utilized to produce amature LRP5 protein are predicted for residues 23, 24, 26, 27, 28, 30and 32 (the first amino acid encoded by exon A).

Determination of the Genomic DNA Sequence Containing and Flanking theSignal Peptide Exon

The region that contained genomic DNA sequence identical to the cDNAsequence encoding a signal peptide was in a gap between two stretches ofcontiguous genomic DNA sequence known as contigs 57 and 58. To closethis gap four clones were chosen from the shotgun library that weredetermined to span this gap according to analysis by the programPhrapview licensed from Dr. Phil Green of the University of Washington(Seattle, Wash.). Direct DNA sequencing of these clones wasunsuccessful, i.e. high GC content significantly reduced the efficiencyof the cycle sequencing. To circumvent this problem PCR products weregenerated by incorporating 7-deaza-dGTP (Pharmacia, Pharmacia Biotech,Piscataway, N.J.). The conditions for these reactions consisted of amodification of the Klentaq Advantage-GC polymerase kit (Clontech, PaloAlto, Calif.). The standard reaction protocol was modified bysupplementing the reaction mix with 200 uM 7-deaza-dGTP. Inserts wereamplified with M13 forward and reverse primers for 32 cycles of 30 secat 92° C., 1 min at 60° C., and 5 min at 68° C. Products were gelpurified using Qiaquick gel extraction kit (Qiagen Inc., Santa Clarita,Calif.) and sequenced as described previously. Assembly of the resultingsequences closed the gap and generated a contiguous sequence ofapproximately 78,000 bp of genomic DNA.

Extension of Isoforms 2 and 3

The software package GRAIL (supra) predicts exons and promoter sequencesfrom genomic DNA sequence. One region identified by GRAIL is an exonoriginally designated G1 and subsequently termed exon 1 that isapproximately 55 kb upstream of the beginning of exon A (FIG. 12(c) (SEQID NO:28)). Three primers designated G1 1f to 3f were designed based onthis sequence. This exon was of particular interest because GRAIL alsopredicted a promoter immediately upstream of the exonic sequence (FIG.12(e)). Furthermore one of the open reading frames in G1 encoded apeptide that had the characteristics of a eukaryotic signal sequence.

To determine whether the G1 predicted exon was part of the LRP5 gene,reverse transcriptase (RT) PCR was performed using the Taqara RNA PCRkit (Panvera, Madison Wis.). Human liver mRNA (50 ng) was used as thetemplate for a 10 ul reverse transcriptase reaction. The reversetranscriptase reaction using one of the LRP5 specific primers (622R,361R, or 318R) was incubated at 60° C. for 30 min, followed by 99° C.for 5 min, and then the sample was placed on ice. One of the forwardprimers, Table 2, (G1 1f, 2f, or 3f) (SEQ ID NOS:75,76,77) was addedalong with the reagents for PCR amplification and the reaction wasamplified for 30 cycles of 30 sec at 94° C., 30 sec at 60° C., and 2 minat 72° C. This primary PCR reaction was then diluted 1:2 in water and 1ul of the reaction was used in a second 20 ul reaction using nestedprimers. The reaction conditions for the second round of amplificationwere 30 cycles of 94° C. for 30 sec, 60° C. for 30 sec and 72° C. for 2min. The products were separated on an agarose gel and excised. Thepurified fragments were subcloned into pCR 2.1 (Invitrogen, Carlsbad,Calif.), plasmid DNA was prepared, and the DNA sequence was determined.

The DNA sequence of these products indicated that G1 (exon 1) waspresent on at least a portion of the LRP5 transcripts. Two differentisoforms were identified. The first, isoform 2 (FIG. 11(a) (SEQ IDNO:23)), identified in this experiment consists of exon 1 followed by anexon that we have given the designation exon 5. This splice variant hasan open reading frame that initiates in exon B nucleotide 402 (FIG.11(a)), the initiator methionine at this location does not conform tothe consensus sequences for translation initiation (Kozak, M. (1996)Mamalian Genome 7:563-574). A second potential initiator methionine ispresent at nucleotide 453, this codon is in a context for efficientinitiation of translation initiation (Kozak, M. (1996) Mamalian Genome7:563-574). The longest potential open reading frame for isoform 2 (FIG.11(c)) encodes a splice variant contains a eukaryotic signal sequence atamino acid 153. The mature peptide generated by this splice variantwould be lacking the first five spacer domains and a portion of thefirst EGF-like motif.

The second isoform (isoform 3) consists of exon 1 followed by exon A(FIG. 12(a)). It is not known whether exon 1 is the first exon ofisoform 2. However the location of a GRAIL predicted promoter upstreamof G1 suggests the possibility that exon 1 is the first exon. Futhermorethere is an open reading frame that extends past the 5′ intron/exonboundary postulated by GRAIL (FIG. 12(b)). Therefore we have examinedthe possiblity of incorporating this extended open reading frame intothe LRP5 transcript. The resulting open reading frame (FIG. 12(c))encodes a 1639 amino acid protein (FIG. 12(d). The initiator methioninecodon does not contain either of the consensus nucleotides that arethought to be important for efficient translation (Kozak, M. 1996,Mamalian Genome 7:563-574). Nor does the predicted protein contain apredicted eukaryotic signal sequence within the first 100 amino acids.Alternatively there may be additional exons upstream of exon 1 whichprovide the initiator methionine codon and/or a potential signalsequence.

RACE Extension of the 5′ end of lrp5: Isoforms 4 and 5

RACE is an established protocol for the analysis of cDNA ends. Thisprocedure was performed using the Marathon RACE template purchased fromClontech (Palo Alto, Calif.). This was performed according toinstructions using Clontech “Marathon” cDNA from fetal brain and mammarytissue. Two “nested” PCR amplifications were performed using theELONGASE™ long-PCR enzyme mix & buffer from Gibco-BRL (Gaithersburg,Md.).

Marathon Primers

AP1: CCATCCTAATACGACTCACTATAGGGC (SEQ ID NOS:407)

AP2: ACTCACTATAGGGCTCGAGCGGC (SEQ ID NOS:408)

First round PCR used 2 microliters Marathon placenta cDNA template and10 pmoles each of primers L217 and AP1. Thermal cycling was: 94° C. 30sec, 68° C. 6 min, 5 cycles; 94° C. 30 sec, 64° C. 30 sec, 68° C. 4 min,5 cycles; 94° C. 30 sec, 62° C. 30 sec, 68° C. 4 min, 30 cycles. Onemicroliter from a 1/20 dilution of this reaction was added to a secondPCR reaction as DNA template. This PCR reaction also differed from thefirst PCR reaction in that nested primers L120 and AP2 were used. Twoproducts of approximately 1600 bp and 300 bp were observed and clonedinto pCR2.1 (Invitrogen, Carlsbad Calif.). The DNA sequence of theseclones indicated that they were generated by splicing of sequences toexon A. The larger 1.6 kb fragment (FIG. 13 (SEQ ID NO:31)) identified aregion approximately 4365 nucleotides upstream of exon A and appeared tobe contiguous with genomic DNA for 1555 base pairs. The sequenceidentified by the 300 bp fragment was approximately 5648 nucleotidesupstream of exon A (FIG. 14 (SEQ ID NO:32)). This sequence hadsimilarity to Alu repeats. The region identified by the 300 bp fragmentwas internal to the region identified by the 1.6 kb fragment. The openreading frame for these isoforms designated 4 and 5 is the same asdescribed for isoform 2 (FIG. 11(b)).

Extension of Isoform 6

GRAIL (supra) analysis was used to predict potential promoter regionsfor the gene. Primers were designed to the isoform 6 promoter sequence(FIG. 15(b)) which was defined by GRAIL and is approximately 4 kbcentromeric of exon A. This region was designated GRAIL promoter-1(Gp-1).

The PCR primer Gp 1f (SEQ ID NO:78) (Table 2) was used in a PCR reactionwith primer 574r and 599r using the polymerase Taq Gold in the reactionbuffer supplied by the manufacturer (Perkin Elmer, Norwalk, Conn.). Thereaction conditions were 12 min at 95° C. followed by 35 cycles of 95°C. for 30 sec, 60° C. for 30 sec, and 72° C. for 1 min 30 sec withapproximately 10 ng of liver cDNA per 20 ul reaction. The primaryreactions were diluted 20 fold in water and a second round of PCR usingprimer Gp 1f in combination with either 474r or 521r was done. Productswere analyzed on a 2% agarose gel and bands of approximately 220 to 400bp were subcloned into pCR 2.1 (Invitrogen, Carlsbad, Calif.) andanalyzed by DNA sequence analysis. The open reading frame present inisoform 4 is the same as described for isoform 2 above (FIG. 11(b)).

Microsatellite Rescue

A vectorette library was made from each clone by restricting each cloneand ligating on a specific bubble linker (Munroe, D. J. et al. (1994)Genomics 19, 506). PCR was carried out beween a primer (Not 1-A)specific for the linker, and a repeat motif (AC)11N, (where N is not A),at an annealing temperature of 65° C. The PCR products were gel purifiedand sequenced using the ABI PRISM dye terminator cycle sequencing kit aspreviously described. From this sequence, a primer was designed, whichwas used in PCR with the Not 1-A primer. This was also sequenced, and asecond PCR primer designed, (Table 8 (SEQ ID NOS:318-333)) so that bothprimers flanked the repeat motif, and were used for genotyping.

Mutation Scanning

Single nucleotide polymorphisms (SNP's) were identified in type 1diabetic patients using a sequencing scanning. approach (Table 5).

Primers were designed to specifically amplify genomic fragments,approximately 500 to 800 bp in length, containing specific regions ofinterest (i.e. regions that contained LRP5 exons, previously identifiedSNP's or GRAIL predicted exons). To facilitate fluorescent dye primersequencing, forward and reverse primer pairs were tailed with sequencesthat correspond to the M13 Universal primer (5′-TGTAAAACGACGGCCAGT-3′)(SEQ ID NO:409) and a modified M13 reverse primer(5′-GCTATGACCATGATTACGCC-3′) (SEQ ID NO:410), respectively. PCR productsproduced using the primer sets, mentioned above, were amplified in 50 ulreactions consisting of Perkin-Elmer 10×PCR Buffer, 200 mM dNTP's, 0.5ul of Taq Gold (Perkin-Elmer Corp., Foster City, Calif.), 50 ng ofpatient DNA and 20 pmol/ml of forward and reverse primers. Cyclingconditions were 95° C. for 12 min; 35 cycles of 95° C. for 30 sec, 57°C. for 30 sec and 68° C. for 2 min, followed by an extension of 72° C.for 6 min and a 4° C. hold.

Conditions were optimized so that only single DNA fragments wereproduced by these reaction. The PCR products were then purified forsequencing using QiaQuick strips or QiaQuick 96 well plates on theQiagen robot (Qiagen Inc., Santa Clarita, Calif.). This purificationstep removes the unincorporated primers and nucleotides.

Direct BODIPY dye primer cycle sequencing was the method used to analyzethe PCR products (Metzker et. al. (1996) Science 271, 1420-1422). ATecan robot (Tecan, Research Triangle Park, N.C.) carried out thesequencing reactions using standard dye primer sequencing protocols (ABIDye Primer Cycle Sequencing with AmpliTaq DNA Polymerase FS,Perkin-Elmer Corp., Foster City, Calif.). The reactions were generatedusing the following cycling conditions on a DNA Engine thermal cycler(M. J. Research Inc., Watertown, Mass.), 15 cycles of 95° C. for 4 sec,55° C. for 10 sec, and 70° C. for 60 sec; followed by 15 cycles of 95°C. for 4 sec, and 70° C. for 60 sec. After cycling, samples were pooled,precipitated and dried down. The samples were resuspended in 3 ul ofloading buffer and 2 ml were run on an ABI 377 Automated DNA sequencer.

Once SNP's have been identified, scanning technologies are employed toevaluate their informativeness as markers to assist in the determinationof association of the gene with disease in the type 1 diabetic families.We are using restriction fragment length polymorphisms (RFLP's) toassess SNP's that change a restriction endonuclease site. Furthermore,we are using forced RFLP PCR (Li and Hood (1995) Genomics 26, 199-206;Haliassos et.al. (1989) Nuc. Acids Res. 17, 3608) and ARMS (Gibbs et.al.(1989) Nuc. Acids Res. 17, 2437-2448; Wu et. al. (1989) Proc. Natl.Acad. Sci. USA 86, 2757-2760) to evaluate SNP's that do not change arestriction endonuclease site. We are also trying to scan larger regionsof the locus by developing fluorescent based Cleavase (CFLP) (LifeTechnologies, Gaithersburg, Md.) and Resolvase, (Avitech Diagnostics,Malvern, Pa.) assays.

Haplotype Analysis at IDDM4

Haplotype mapping (or identity-by-descent mapping) has been used inconjunction with association mapping to identify regions ofidentity-by-descent (IBD) in founder populations, where (some) of theaffected individuals in a founder population share not only themutation, but also a quite large genomic haplotype (hence identicalpiece of DNA) surrounding the disease locus. Recombinant haplotypes canbe utilised to delineate the region containing the mutation. Thesemethods have been used to map the genes of the recessive disorders:Wilson's disease, Batten's disease, Hirschsprung's disease andhereditary haemochromatosis (Tanzi, R., et al. (1993) Nature Genet 5,344-350; The International Batten Disease Consortium. (1995) Cell 82,949-957; Puffenberger, E., et al. (1994) Hum Mol Genet 3, 1217-1225; andFeder, J., et al. (1996) Nature Genet 13, 399-408). Similarly, in type 1diabetes, for IDDM1, comparative MHC haplotype mapping between specificCaucasian and haplotypes of African origin identified both HLA-DQA1 andHLA-DQB1 as susceptibility loci for this disorder (Todd, J. et al (1989)Nature 338, 587-589; and Todd, J. et al (1987) Nature 329, 599-604).

On chromosome 11q13 haplotype analysis was undertaken in conjunctionwith association analysis in order to identify regions of IBD betweenhaplotypes which are transmitted more often than expected, hence containa susceptible allele at the aetiological locus; in contrast protectivehaplotypes will be transmitted less often than expected and contain adifferent (protective) allele at the aetiological locus. Evidence for adeviation in the expected transmission of alleles was shown with the twopolymorphic markers D11S1917 and H0570POLYA. In 2042 type 1 diabeticfamilies from the UK, USA, Norway, Sardinia, Romania, Finland, Italy andDenmark, transmission of D11S1917-H0570POLYA haplotype 3-2 to affectedoffspring was negative (46%), with a 2×2 test of heterogeneity betweenaffected and unaffected transmissions produced χ²=23, df=1, p<5×10⁻⁶,providing good evidence that this is a protective haplotype. Incontrast, the 2-3 haplotype was more transmitted to affected thannon-affected offspring (%T=51.3; 2×2 contingency test; χ²=5.5, df=1,p<0.02), indicating that this was a susceptible (or possibly neutral)chromosome. A further haplotype, which is rare, has been identifiedwhich appears to be susceptible to type 1 diabetes (D11S1917-H0570POLYA,3-3, %T affecteds=62.4, 2×2 contingency test, affecteds vsnon-affecteds;chi²=6.7, df=1, p<0.009). Therefore, analysis ofassociation in this region has produced evidence for a haplotype whichcontains an allele protective against type 1 diabetes, as it issignificantly less transmitted to the affected offspring in comparisonto the unaffected offspring, and evidence for two non-protectivehaplotypes, which have a neutral or susceptible effect on type 1diabetes.

Extending this haplotype analysis to include the 14 flankingmicrosatellite markers 255ca5, D11S987, 255ca6, 255ca3, D11S1296,E0864CA, TAA, L3001CA, D11S1337, 14LCA5, D11S4178, D11S970, 14LCA1,18O18, as well as the single nucleotide polymorphisms (SNPs) 58-1, ExonE (intronic, 8bp 3′ of exon 6) and Exon R (Ala ¹³³⁰, exon 18) (FIG. 19),revealed highly conserved haplotypes within this interval in thediabetic individuals. A distinct protective haplotype (A) has beenidentified (encompassing the 3-2 haplotype at D11S1917-H0570POLYA), aswell as a distinct susceptible haplotype (B) (encompassing the 2-3haplotype at D11S1917-H0570POLYA ). The susceptible haplotype is IBDwith the protective haplotype, 3′ of marker D11S1337, indicating thatthe aetiological variant playing a role in type 1 diabetes does not liewithin the identical region, localising it 5′ of Exon E of the LRP-5gene. This region that is IBD between the protective, and susceptiblehaplotypes prevents association analysis being undertaken, as nodeviation in transmission to affected offspring would be detected. Therare susceptible haplotype (C), 3-3 at D11S1917-H0570POLYA, can also beidentified. Haplotype analysis with the additional markers in the regionreveals that this rare susceptible haplotype is identical to thesusceptible haplotype between UT5620 and 14L15CA, potentially localisingthe aetiological variant between UT5620 and Exon E, which isapproximately 100 kb. Therefore, the susceptible and rare susceptiblehaplotypes may carry an allele (or separate alleles) which confers asusceptible effect on type 1 diabetes, whereas the protective haplotypecontains an allele protective against IDDM. The 5′ region of the LRP5gene lies within this interval, encompassing the 5′ regulatory regionsof the LRP5 gene and exons 1 to 6.

Analysis of the Italian and Sardinian haplotypes revealed an additionaltwo susceptible haplotypes. At D11S1917-H0570POLYA in the Italianfamilies haplotype 1-3, 63%T, 2×2 affected verses non-affecteds p=0.03(haplotype D). At H0570POLYA -L3001 in the Sardinian families haplotype1-2 58%T, 2×2 affected verses non-affecteds, p=0.05 (haplotype E).

Samples containing the above five haplotypes were genotyped with SNPsfrom the IDDM4 region in order to investigate regions of IBD (FIG. B).These SNPs confirmed the region of IBD between the susceptiblehaplotypes B and C between UT5620 and 14L15CA. It also confirmed theregion of IBD between the protective and susceptible haplotypes A and B3′ of marker D11S1337, excluding this region from containing theaetiological variant. The SNP analysis also revealed a potential regionof IBD between UT5620 and TAA, between the susceptible haplotypes B, C,D and E, which is distinct from the protective haplotype A (a 25 kbregion). The marker H0570POLYA lies within this interval, and is notidentical in haplotype E compared to the other susceptible haplotypes;possibly this is due to mutation at this polymorphism, or it delineatesa boundary within this region and the aetiological variant is either 5′or 3′ of this marker. Further analysis of additional SNPs within thisinterval will be necessary.

Therefore haplotype mapping within the IDDM4 region has identified aregion of IBD between the susceptible haplotypes B and C of 100 kb, inthe 5′ region of the LRP5 gene. SNP haplotype mapping has possiblyfurther delineated this to a 25 kb interval encompassing the 5′ regionof LRP5 which includes possible regulatory sequences for this gene; aputative promoter, and regions of homology with the mouse syntenicregion (Table 12), as well as exon 1 of LRP5.

Construction of Adenovirus Vectors Containing LRP5

The full-length human LRP5 gene was cloned into the adenovirus transfervector pdelE1sp1A-CMV-bGHPA containing the human Cytomegalovirusimmediate early promoter and the bovine growth hormone polyadenylationsignal to create pdehlrp3. This vector was used to construct anadenovirus containing the LRP5 gene inserted into the E1 region of thevirus directed towards the 5′ ITR. In order to accommodate a cDNA ofthis length, the E3 region has been completely deleted from the virus asit has been described for pBHG10 (Bett at al.1994 Proc Natl Acad Sci 91:8802-8806) An identical strategy was used to construct an adenoviralvector containing the full-length mouse Lrp5 gene.

A soluble version of mouse Lrp5 was constructed in which a His tag and atranslational stop signal replaced the putative transmembrane spanningdomain (primers listed in Table 9 (SEQ ID NOS:49-74,334-402)). Thisshould result in the secretion of the extracellular domain of Lrp5 andfacilitate the biochemical characterization of the putative ligandbinding domain of Lrp5. Similarly a soluble version of human LRP5 can beconstructed using primers shown in Table 9 (SEQ ID NOS:49-74,334-402).The extracellular domain runs to amino acid 1385 of the precursor(immature) protein sequence.

Identification of LRP5 Ligands

LRP5 demonstrates the ablility to bind and take up LDL (see below), butthis activity is not a high level. Therefore, it is likely that LRP5 hasthe capacity to bind additional ligand(s). To identify LRP5 ligands theextracellular domain consisting of the first 1399 amino acids of humanLRP5, or the corresponding region of mouse Lrp5 will be purified. Anumber of expression systems can be used these include plasmid basedsystems in Drosophila S2 cells, yeast and E. coli and viral basedsystems in mammalian cells and SF9 insect cells. A histidine tag will beused to purify LRP5 on a nickel column (Novagen, Madison Wis.). Avariety of resins may be used in column chromatography to further enrichsoluble LRP5. LRP5 will be attached to a solid support e.g. a nickelcolumn. Solutions containing ligands from serum fractions, urinefractions, or fractions from tissue extracts will be fractionated overthe LRP5 column. LRP5 complexed with bound ligand will be eluted fromthe nickel column with imidizole. The nature of the ligand(s) bound toLRP5 will be characterized by gel electrophoresis, amino acid sequence,amino acid composition, gas chromatography, and mass spectrophotometer.

Attachment of purified LRP5 to a BiaCore 2000 (BiaCore, Uppsula Sweden)chip will be used to determine whether ligands that bind to LRP5 arepresent in test solutions. Once ligands for LRP5 are identified the LRP5chip will be used to characterize the kinetics of the LRP5 ligandinteraction.

Adenoviral vectors containing soluble versions of LRP5 will be used toinfect animals, isolation of ligand/LRP5 complexes from serum or liverextracts will be facilitated by the use of a histidine tag andantibodies directed against this portion of LRP5.

Treatment of Animals with LRP5 virus

A wide range of species may be treated with adenovirus vectors carryinga transgene. Mice are the preferred species for performing experimentsdue to the availability of a number of genetically altered strains ofmice, i.e. knockout, transgenic and inbred mice. However larger animalse.g. rats or rabbits may be used when appropriate. A preferred animalmodel to test the ability of LRP5 to modify the development of type 1diabetes is the non-obese diabetic (NOD) mouse. Preferred animal modelsfor examination of a potential role for LRP5 in lipoprotein metabolismare mice in which members of the LDL-receptor family have beendisrupted, e.g. the LDL-receptor (LDLR), or in which genes involved inlipoprotein metabolism, e.g. Apo-E, have been disrupted.

Adenoviruses are administered by injecting approximately 1×10⁹ plaqueforming units into the tail vein of a mouse. Based on previous studiesthis form of treatment results in the infection of hepatocytes at arelatively high frequency. Three different adenovirus treatments wereprepared, 1.) adenovirus containing no insert (negative control), 2.)adenovirus containing human LDLR (positive control) or 3.) adenoviruscontaining human LRP5. Each of these viruses were used to infect fiveC57 wild type and five C57 LDLR knockout mice. A pretreatment bleed, 8days prior to injection of the virus was used to examine serum chemistryvalues prior to treatment. The animals were injected with virus. On dayfive following administration of the virus a second (treatment) bleedwas taken and the animals were euthanized for collection of serum forlipoprotein fractionation. In addition tissues were harvested for insitu analysis, immuno-histochemistry, and histopathology.

Throughout the experiment, animals were maintained in a standardlight/dark cycle and given a regular chow diet. The animals were fastedprior to serum collection. In certain experimental conditions it may bedesirable to give animals a high fat diet.

Standard clinical serum chemistry assays were performed to determine;serum triglycerides, total cholesterol, alkaline phosphatase, aspartateaminotransferase, alanine aminotransferase, urea nitrogen, andcreatinine. Hematology was performed to examine the levels ofcirculating leukocytes, neutrophils, the percent lymphocytes, monocytes,and eosinophils, erythrocytes, platelets, hemoglobin, and percenthematocrit.

Serum lipoproteins were fractionated into size classes using a Superose6 FPLC sizing column and minor modifications of the procedure describedby Gerdes et al. (Clin. Chim. Acta 205:1-9 (1992)), the most significantdifference from the Gerdes procedure being that only one column wasused. Column fractions were collected and analyzed for cholesterol andtriglyceride. The “area under the curve” was calculated for eachlipoprotein class. The approximate peak fractions that correspond toeach of the classes defined by density are: fraction 24 for VLDL,fraction 36 for LDL and fraction 51 for HDL.

LRP5 Overexpression Affects Serum Triglycerides and Lipoproteins

Statistical analysis of serum chemistry data indicated that relative tocontrol virus there was a 30% decrease, p value=0.025, in triglyceridelevels in animals treated with LRP5 containing virus (Table 10). Thisdecrease in triglycerides occurred at a similar level in both wild typeand KO mice. By comparison, the LDLR virus reduced serum triglyceridesapproximately 55% relative to the contol virus. This result indicatesthat LRP5 has the potential to modulate serum triglyceride levels.

The serum lipoprotein profile indicated that the VLDL particle class wasdecreased in wild type mice treated with LRP5 virus. Although the numberof samples analyzed was not sufficient for statistical analyses, thisresult is consistent with the observed decrease in serum triglycerides.These results suggest that LRP5 has the potential to bind andinternalize lipid rich particles, causing the decrease in serumtriglycerides and VLDL particles. Therefore treatment with LRP5 or withtherapeutic agents that increase the expression of LRP5 or thebiological activity of LRP5 may be useful in reducing lipid richparticles and triglycerides in patients with diseases that increasetriglyceride levels, e.g. type 2 diabetes and obesity.

Although not statistically significant there was an observed trendtowards a reduction in serum cholesterol levels as a consequence of LRP5treatment (28%, p=0.073) in mice that have a high level of serumcholesterol (approximately 220 mg/dL), due to a disruption (knockout) ofthe LDL-receptor (Table 10). An opposite trend, in that LRP5 treatmentelevated serum cholesterol (30%, p=0.08) was not observed in wild typemice which have a relatively low level of serum cholesterol(approximately 70 mg/dL). The small treatment groups, n=4, in these datasets limits the interpretation of these results and indicates thatfurther experimentation is necessary. Nevertheless, these resultssuggest that in a state of elevated cholesterol an increase in theactivity of LRP5 might reduce serum cholesterol levels. Thereforetreatment with LRP5 or with therapeutic agents that increase either theexpression of LRP5 or the biological activity of LRP5 may be useful inreducing cholesterol in patients with hypercholesterolemia.

LRP5 Overexpression May Affect Serum Alkaline Phosphatase Levels

Serum alkaline phosphatase levels can be dramatically elevated, e.g. 20fold increase, as a consequence of an obstruction of the bile duct(Jaffe, M. S. and Mcvan, B., 1997, Davis's laboratory and diagnostictest handbook. pub. F. A. Davis Philadelphia Pa.). However, lowerlevels, up to a three fold increase of alkaline phosphatase can resultfrom the inflammatory response that take place in response to aninfectious agent in the liver, e.g. adenovirus. In animals treated witha control virus there was an approximately 2-fold increase in alkalinephosphatase levels. In contrast, there was only a slight increase inalkaline phophatase levels in animals treated with the LRP5 virus.Relative to the control the alkaline phosphatase level was reduced 49%in the LRP5 treated animals, p value=0.001 (Table 10).

The increase in alkaline phosphatase levels may be a consequence of thelevel of infection with the adenovirus, therefore, a possibleexplanation for the decrease in the animals treated with the LRP5 virusmay simply be due to less virus in this treatment group. An indicator ofthe level of the viral infection is the appearance in the serum of theliver enzymes aspartate aminotransferase and alanine aminotransferase.These enzymes are normally found in the cytoplasm of cells and elevatedin the serum when cellular damage occurs (Jaffe, M. S. and McVan, B.,1997, Davis's laboratory and diagnostic test handbook. pub. F. A. DavisPhiladelphia Pa.). Therefore these enzymes serve as markers for thelevel of toxicity that is a consequence of the adenoviral infection.These enzymes are present at a normally low level prior to the infectionand in animals that did not receive virus. Importantly, the levels ofaspartate aminotransferase and alanine aminotransferase are higher inthe animals given the LRP5 virus indicating that these animals have morecellular damage and thus a more extensive infection than the animalsgiven the control virus (Table 11). Therefore, it is unlikely that thereduced level of alkaline phosphatase is simply owing to less LRP5 virusbeing administered. A second possible explanation is that LRP5 modifiesthe nature of the inflammatory response resulting from the adenovirusinfection. A possible role for LRP5 in modulating the inflammatoryresponse is consistent with the genetic data indicating that this geneis associated with risk for developing type 1 diabetes. Chronicinsulitis or inflammation is a precursor to clinical onset of type 1diabetes therefore LRP5 treatment or treatment with therapeutic agentsthat either increase the transcription of LRP5 may be of utility inpreventing type 1 diabetes. Type 1 diabetes is an autoimmune disease,therefore treatment with LRP5 or with therapeutics agents that eitherincrease the expression of LRP5 or the biological activity of LRP5 maybe useful in treating other autoimmune diseases.

Expression of LRP5 in Cell Lines

Overexpression of LRP5 under the control of a heterologous promoter canbe accomplished either by infection with an adenovirus containing LRP5or by transfection with a plasmid vector containing LRP5. Transfectionwith a plasmid vector can lead to either transient or a stableexpression of the transgene.

Endogenous LDL-receptors reduce the ability to detect the uptake of LDLby other members of the LDL-receptor family. To study lipoprotein uptakein the absence of the LDL-receptor, primary cell lines from humanpatients with familial hypercholesterolemia (FH) were used. These FHcell lines lack any endogenous LDL-receptor. FH fibroblasts wereinfected at an MOI of 500 plaque forming units per cell for 24 hours at37° C. Following infection, cells were incubated with 40 μg/ml ¹²⁵I-LDLat 37° C. After 4 hours, cells were washed and uptake of LDL measured. Amodest (approximately 60%) increase in the level of LDL uptake wasobserved. By comparison, the infection of FH cells with an adenoviruscontaining the LDL-receptor resulted in a 20-fold increase in LDL uptake(p<0.0001, n=3). To determine whether this modest level of activitymediated by LRP5 was statistically significant, 24 individual wells wereinfected with LRP5 virus and analyzed. Statistical analysis of thisexperiment indicated that the increase in LDL uptake was highlysignficant, p<0.0001. Therefore LRP5 can mediate LDL uptake. However,based on the modest level of activity, relative to the LDL-receptor, itdoes not appear that the primary activity of LRP5 is to mediate theuptake of LDL.

Additional cell lines exist that lack either the LDL-receptor or othermembers of the LDL-receptor family. The PEA-13 cell line (ATCC 2216-CRL)lacks the LRP1 receptor. Mutant CHO cells lacking the LDL receptor havebeen described by Kingsley and Krieger (Proceedings National AcademySciences USA (1984) 81:5454). This cell line, known as 1dlA7, isparticularly useful for the creation of stable transfectant cell linesexpressing recombinant LRP5.

Anti-LRP5 Antibodies

Western Blot Analysis

Antisera prepared in rabbits immunized with the human LRP5 MAP peptides

SYFHLFPPPPSPCTDSS (SEQ ID NO:403)

VDGRQNIKRAKDDGT (SEQ ID NO:404)

EVLFTTGLIRPVALVVDN (SEQ ID NO:405)

IQGHLDFVMDILVFHS (SEQ ID NO:406)

were evaluated by Western blot analysis.

COS cells were infected with an adenovirus containing human LRP5 cDNA.Three days after the infection the cells were harvested by scraping intophosphate buffered saline (Gibco/BRL Gaithersburg, Md.) containing theprotease inhibitors PMSF (100 ug/ml), aprotinin (2 ug /ml), andpepstatin A (1 ug/ml). The cells were pelleted by a low speed spin,resuspended in phosphate buffered saline containing protease inhibitorsand lysed by Dounce homogenization. Nuclei were removed with a low speedspin, 1000 rpm for 5 min in a Beckman J-9 rotor. The supernatant wascollected and centrifuged at high speed, 100,000×g for 3 hours, topellet the membranes. Membranes were resuspended in SDS-sample buffer(Novex, San Diego Calif.).

Membrane proteins were fractionated by electrophoresis on a 10%Tris-glycine acrylamide gel (Novex, San Diego Calif.). The fractionatedproteins were transferred to PVDF paper (Novex, San Diego Calif.)according to the manufacturer's instructions. Standard Western blotanalysis was performed on the membrane with the primary antibody being a1:200 dilution of crude antisera and the secondary antibody a 1:3000dilution of antirabbit IgG HRP conjugate (Amersham, Arlington Heights,Ill.). ECL reagents (Amersham, Arlington Heights, Ill.) were used tovisualize proteins recognized by the antibodies present in the sera.

A band of approximately 170-180 kD was detected by sera from a rabbitimmunized with the peptide SYFHLFPPPPSPCTDSS (SEQ ID NO:403). This bandwas only detected in the cells that were infected with the adenoviruscontaining human LRP5 and was not present in cells that were infectedwith a control virus. Furthermore, the detection of this 170 kD band wasblocked by preadsorbing a 1:500 dilution of the sera with 0.1 ug/ml ofthe peptide SYFHLFPPPPSPCTDSS (SEQ ID NO:403) but not with 0.1 ug/ml ofthe peptide VDGRQNIKRPAKDDGT (SEQ ID NOS:404). Therefore this proteinband of approximately 170 kD detected by the antibody directed againstthe peptide SYFHLFPPPPSPCTDSS (SEQ ID NO:403) is human LRP5. Thepredicted size of the mature human LRP5 protein is 176 kD.

The antisera from a rabbit immunized with the peptide SYFHLFPPPPSPCTDSS(SEQ ID NO:403) was affinity purified with an Affigel 10 column (BioRad,Hercules Calif.) to which the MAP peptide SYFHLFPPPPSPCTDSS (SEQ IDNO:403) was covalently attatched. This results in antisera with greaterspecificity for LRP5.

The antisera from a rabbit immunized with the peptide IQGHLDFVMDILVFHS(SEQ ID NOS:406) is able to detect a band of approximately 170 kD thatis present in cells infected with an LRP5 containing virus but not cellsinfected with a control virus. This antibody recognizes a peptide thatis present in the putative extracellular domain of LRP5 and thus will beuseful in detecting the soluble version of LRP5. However, there isgreater background observed when using this antisera relative to thatfrom the rabbit immunized with the peptide SYFHLFPPPPSPCTDSS (SEQ IDNO:403).

LRP5 is Expressed in Tissue Macrophages

The crude and affinity purified antisera to the LRP5 peptideSYFHLFPPPPSPCTDSS (SEQ ID NO:403) was used for immunocytochemistrystudies in human liver. The antibody recognized tissue macrophages,termed Kupfer cells in the liver, that stained positive for LRP5 andpositive for the marker RFD7 (Harlan Bioproducts, Indianapolis Ind.)which recognizes mature tissue phagocytes and negative for an MHC classII marker, RFD1 (Harlan Bioproducts, Indianapolis Ind.). This pattern ofstaining (RFD1−RFD7+) identifies a subpopulation of macrophages, theeffector phagocytes. This class of macrophages has been implicated inthe progression of disease in a model for autoimmune disease,experimental autoimmune neuritis (Jung. S. et al., 1993, J Neurol Sci119: 195-202). The expression in phagocytic tissue macrophages supportsa role for LRP3 in modulating the inflammatory component of the immuneresponse. This result is consistent with the proposed role based on thedifferences observed in alkaline phoshatase levels in animals treatedwith LRP5 virus and the genetic data indicating that LRP5 is a diabetesrisk gene. Determination of additional conserved regions of the LRP5gene

High throughput DNA sequencing of shotgun libraries prepared from mouseBAC clones 131-p-15 and 53-d-8 was used to identify regions of the LRP5gene that are conserved between mouse and man. To identify these regionsthe mouse-genomic DNA, either unassembled sequences or assembledcontigs, was compared against an assembly of human genomic DNA. Thecomparison was done by using the BLAST algorithm with a cutoff of 80%.This analysis resulted in the identification of a majority of the exonsof the LRP5 gene and identified a number of patches of conservedsequences at other locations in the gene (Table 12).

There are sequences conserved between human and mouse located 4.3 kb and168 bp upstream of the putative ATG. These sequences may represent 5′untranslated sequences of the mRNA transcript or promoter elements.

Within the putative first intron of 36 kb there are twelve patches thatexhibit a degree of DNA sequence conservation. Some of these regions,e.g. 41707-41903, are quite extensive and have a high degreee ofsequence conservation, similar to that observed for the exons of theLRP5 gene. Since these regions do not appear to be transcribed it islikely that these conserved regions play a role in regulating either thetranscription of the LRP5 gene or the processing of the LRP5 mRNAtranscript. Regardless of exact nature of their role these newlyidentified regions represent areas where sequence polymorphism mayaffect the biological activity of LRP5.

The BAC clone 131-p-15 which contains the first two exons of LRP5 wassequenced extensively, i.e. approximately 6× coverage. BAC clone 53-d-8contains sequences from exon D to exon V, however the level of sequencecoverage of this clone was only approximately 1× (skim sequencing). Theskim sequencing of mouse BAC 53-d-8 resulted in 76% of the exons beingdetected, however in some instances only a portion of an exon waspresent in the mouse sequence data. In addition to the exons, there werethree patches in the BAC 53-d-8 sequences that exhibited a degree ofsequence conservation with the human sequences (Table 12). All of thesewere located in the large 20 kb intron between exons D and E. Thesesequences may represent regions that are important for the processing ofthis large intron and thus polymorphisms in these sequences may affectthe expression level of LRP5.

Determination of Relative Abundance of Alternatively Spliced LRP5 mRNATranscripts

Several techniques may be used to determine the relative abundance ofthe different alternatively spliced isoforms of LRP5.

Northern blot analysis of probes derived from specific transcripts isused to survey tissues for the abundance of a particular transcript.More sensitive techniques such as RNase protection assays will beperformed. Reagents from commercially available kits (Ambion, Inc.Austin Tex.) are used to prepare probes. The relative abundance oftranscript that hybridizes to a probe radiolabeled with [alpha]32P-UTPis analyzed by native and denaturing acrylamide gels (Novex Inc., SanDiego, Calif.). Primer extension assays are performed according toestablished procedures (Sambrook et. al. (1989) Molecular Cloning, ColdSpring Harbour Press, New York) using reverse primers derived from the5′ portion of the transcript.

Isolation of Other Species Homologs of LRP5 Gene

The LRP5 gene from different species, e.g. rat, dog, are isolated byscreening of a cDNA library with portions of the gene that have beenobtained from cDNA of the species of interest using PCR primers designedfrom the human LRP5 sequence. Degenerate PCR is performed by designingprimers of 17-20 nucleotides with 32-128 fold degeneracy by selectingregions that code for amino acids that have low codon degeneracy e.g.Met and Trp. When selecting these primers preference is given to regionsthat are conserved in the protein e.g. the motifs shown in FIG. 6b. PCRproducts are analyzed by DNA sequence analysis to confirm theirsimilarity to human LRP5. The correct product is used to screen cDNAlibraries by colony or plaque hybridization at high stringency.Alternatively probes derived directly from the human LRP5 gene areutilized to isolate the cDNA sequence of LRP5 from different species byhybridization at reduced stringency. A cDNA library is generated asdescribed above.

REFERENCES

1. Bach, J. -F (1994). Endocrine. Rev. 15: 516-542.

2. Bain, S., et al. (1992). Diabetes 41: 91A.

3. Bell, G. I., et al. (1984). Diabetes 33: 176-83.

4. Bennett, S. T., et al. (1995). Nature Genet. 9: 284-292.

S. Bennett, S. T. and Todd, J. A (1996). Annu. Rev. Genet.30: 343-370.

6. Buckler, A. et al. (1991). P.N.A.S USA 88: 4 005-4009.

7. Davies, J. L., et al. (1994). Nature 371: 130-136.

8. Doria, A., et al (1996). Diabetologia 39: 594-599.

9. Hashimoto, L., et al. (1994). Nature 371: 161-164.

10. Holmans, P. (1993). Am. J. Hum. Genet. 52: 362-374.

11. Julier, C., et al. (1991a). Nature 354: 155-159.

12. Kennedy, G. C., et al. (1995). Nature Genet. 9: 293-298.

13. Kyvik, K. O., et al. (1995). Brit. Med. J. 311: 913-917.

14. Lucassen, A., et al. (1993). Nature Genet. 4: 305-310.

15. Lucassen, A., et al. (1995). Hum. Mol. Genet. 4: 501-506.

16. Luo, D. -F., et al. (1996). Hum. Mol. Genet. 5: 693-698.

17. Matsuda, A. and Kuzuya, T. (1994). Diab. Res. Clin. Pract. 24:Suppl., S63-S67.

18. Risch (1987). Am. J. Hum. Genet. 40: 1-14.

19. Owerbach, D., et al. (1990). Diabetes 39: 1504-1509.

20. Parimoo, S., et al. (1991). P.N.A.S. USA 88: 9623-9627.

21. Penrose, L. S. (1953). Acta. Genet. Stat. Med. 4: 257-265.

22. Risch, S. S. (1990). Diabetes 39: 1315-19.

23. Spielman, R., et al. (1993). Am. J. Hum. Genet. 52: 506-516.

24. Thomson, G., et al. (1989). Genet. Epidemiol. 6: 155-160.

25. Tisch, R. and McDevitt, H. O. (1996). Cell 85: 291-297.

26. Todd, J. A. (1994). Diabetic Med. 11: 6-16.

27. Todd, J. A., et al. (1987). Nature 329: 599-604.

28. Todd, J. A. and Farrall, M. (1996). Hum. Mol. Genet. 5: 1443-1448.

29. Todd, J. A., et al. (1989). Nature 338: 587-589.

30. Vafiadis, P., et al. (1996). J. Autoimmunity 9: 397-403.

TABLE 1 Haplotype analysis at D11S1917 (UT5620) - H0570POLYA, within2582 families from UK, USA, Norway and Sardinia. Susceptible, protectiveand neutral alleles were identified at each polymorphism, andtransmission of recombinant haplotypes to diabetic offspring wascalculated (t = transmission, nt = non transmission). Significanttransmission of the haplotype 332-104 was detected (P = 0.005), as wellas significant non-transmission of the haplotype 328-103 (P = 0.03).D11S1917 (UT5620) H0570POLYA t nt⁻ P 328 104 539 474 Protective 332 103427 521 0.002 Susceptible 332 104  60  33 0.005 Protective 328 103  16 31 0.03

TABLE 2 PCR Primers for obtaining LRP5 cDNA Primers located within LRP5cDNA: The primers are numbered beginning at nucleotide 1 in FIG. 5(a)(SEQ ID NO: 1). 1F (muex 1f): ATGGAGCCCGAGTGAGC (SEQ ID NO:49) 218R(27R): ATGGTGGACTCCAGCTTGAC (SEQ ID NO:50) 256F (1F):TTCCAGTTTTCCAAGGGAG (SEQ ID NO:51) 265R (26R): AAAACTGGAAGTCCACTGCG (SEQID NO:52) 318R (4R): GGTCTGCTTGATGGCCTC (SEQ ID NO:53) 343F (2F):GTGCAGAACGTGGTCATCT (SEQ ID NO:54) Vector Primers for RCCA 361R (21R):GTGCAGAACGTGGTCATCT (SEQ ID NO:54) 622R (2R): AGTCCACAATGATCTTCCGG (SEQID NO:55) 638F (4F): CCAATGGACTGACCATCGAC (SEQ ID NO:56) 657R (1R):GTCGATGGTCAGTCCATTGG (SEQ ID NO:57) 956R (22R) : TTGTCCTCCTCACAGCGAG(SEQ ID NO:58) 1713F (21F): GGACTTCATCTACTGGACTG (SEQ ID NO:59) 1481R(23R): CAGTCTGTCCAGTACATGAG (SEQ ID NO:60) 1981F (22F):GCCTTCTTGGTCTTCACCAG (SEQ ID NO:61) 2261F (23F) :GGACCAACAGAATCGAAGTG(SEQ ID NO:62) 2484R (5R): GTCAATGGTGAGGTCGT (SEQ ID NO:63) 2519F (5F):ACACCAACATGATCGAGTCG (SEQ ID NO:64) 3011F (24F): ACAAGTTCATCTACTGGGTG(SEQ ID NO:65) 3154F (25F): CGGACACTGTTCTGGACGTG (SEQ ID NO:66) 3173R(25R): CACGTCCAGAACAGTGTCCG (SEQ ID NO:67) 3556R (3R):TCCAGTAGAGATGCTTGCCA (SEQ ID NO:68) Vector Primers for RCCA 3577F (3F):ATCGAGCGTGTGGAGAAGAC (SEQ ID NO:69) 4094F (30F): TCCTCATCAAACAGCAGTGC(SEQ ID NO:70) 4173R (6R): CGGCTTGGTGATTTCACAC (SEQ ID NO:71) 4687F(6F): GTGTGTGACAGCGACTACAGC (SEQ ID NO:72) 4707R (30R):GCTGTAGTCGCTGTCACACAC (SEQ ID NO:73) 5061R (7R): GTACAAAGTTCTCCCAGCCC(SEQ ID NO:74) PCR primers in Sequences identified by GRAIL G1 1F:TCTTCTCCAGAGGATGCAGC (SEQ ID NO:75) G1 2F: TTCGTCTTGAACTTCCCAGC (SEQ IDNO:76) G1 3F: TCTTCTTCTCCAGAGGATGCA (SEQ ID NO:77) Gp1 1F:AGGCTGGTCTCAAACTCCTG (SEQ ID NO:78) PBS.543R: GGGGATGTGCTGCAAGGCGA (SEQID NO:79) PBS.578R: CCAGGGTTTTCCCAGTCACGAC (SEQ ID NO:80) PBS.838F:TTGTGTGGAATTGTGAGCGGATAAC (SEQ ID NO:81) PBS.873F:CCCAGGCTTTACACTTTATGCTTCC (SEQ ID NO:82)

TABLE 3 Intron-Exon Organization of Human LRP-5 Exon Intron 3′ AcceptorSequence Exon Size 5′ Donor Sequence Number & Intron Exon Number (bp)Exon Intron Size (bp) ccgggtcaac/ATCGGAG Ex 1 (6) (91) CCGCGG/gtaggtgggc1 (350S1) (SEQ ID NO:411) (SEQ ID NO:412) tgccccacag/CCTCGC Ex 2 (A)(391) TCACGG/gtaaaccctg 2 (9408) (SEQ ID NO:413) (SEQ ID NO:414)ccgtcacag/GTACAT Ex 3 (B) (198) GTTCCG/gtaggtaccc 3 (6980) (SEQ IDNO:415) (SEQ ID NO:416) ctgactgcag/GCAGAA Ex 4 (C) (197)CTTTCT/gtgagtgccg 4 (1640) (SEQ ID NO:417) (SEQ ID NO:418)gttttcccag/TCCACA Ex 5 (D) (132) AGGCAG/gtgaggcggt 5 (20823) (SEQ IDNO:419) (SEQ ID NO:420) gtctccacag/GAGCCG Ex 6 (E) (397)GATGGG/gtaagacggg 6 (3213) (SEQ ID NO:421) (SEQ ID NO:422)tcttctccag/CCTCAT Ex 7 (F) (172) ATCGAG/gtgaggctcc 7 (13445) (SEQ IDNO:423) (SEQ ID NO:424) cgtcctgcag/GTGATC Ex 8 (G) (217)TCGTCG/gtgagtccgg 8 (2826) (SEQ ID NO:425) (SEQ ID NO:426)tcgcttccag/GAACCA Ex 9 (H) (290) CTGAAG/gtagcgtggg 9 (5000+) (SEQ IDNO:427) (SEQ ID NO:428) ctgctgccag/ACCATC Ex 10 (I) (227)CAAGGG/gtaagtgttt 10 (1295) (SEQ ID NO:429) (SEQ ID NO:430)tgccttccag/CTACAT Ex 11 (J) (185) TGCTGG/gtgagggccg 11 (2068) (SEQ IDNO:431) (SEQ ID NO:432) gttcatgcag/GTCAGG Ex 12 (K) (324)GCAGCC/gtaagtgcct 12 (2005) (SEQ ID NO:433) (SEQ ID NO:434)cctcctctag/CGCCCA Ex 13 (L) (200) ACCCAG/gcaggtgccc 13 (6963) (SEQ IDNO:435) (SEQ ID NO:436) tgtcttacag/CCCTTT Ex 14 (M) (209)GCGAGG/gtaggaggcc 14 (1405) (SEQ ID NO:437) (SEQ ID NO:438)cctcccgcag/GTACCT Ex 15 (N) (191) TGTCAG/gtaaggggcc 15 (686) (SEQ IDNO:439) (SEQ ID NO:440) ctgcttgcag/GGGCCA Ex 16 (O) (210)AGTTCT/gtacgtgggg 16 (3894) (SEQ ID NO:441) (SEQ ID NO:442)gtctttgcag/CAGCCC Ex 17 (P) (126) GTGGAG/gtaggtgtga 17 (3903) (SEQ IDNO:443) (SEQ ID NO:444) cctcccccag/AGCCGC Ex 18 (Q) (237)GTGACG/gtgaggccct 18 (3042) (SEQ ID NO:445) (SEQ ID NO:446)tcccttgcag/CCATCT Ex 19 (R) (111) TGTGTG/gtgagccagc 19 (1448) (SEQ IDNO:447) (SEQ ID NO:448) tctctggcag/AAATCA Ex 20 (S) (237)TCACAG/gtaaggagcc 20 (1095) (SEQ ID NO:449) (SEQ ID NO:450)tccctgccag/GCATCG Ex 21 (T) (140) CCGCCG/gtgaggggcg 21 (6514) (SEQ IDNO:451) (SEQ ID NO:452) ctctcctcag/ATCCTG Ex 22 (U) (98)GTACAG/gtaggacatc 22 (2275) (SEQ ID NO:453) (SEQ ID NO:454)tccctttcag/GCCCTA Ex 23 (V) (>262) 23 (19985) (SEQ ID NO:455)

TABLE 4 LRP-5 Exon primers (SEQ ID NO: 83) E1x1 1f CAGGGTTTCATCCTTTGTGG(SEQ ID NO: 84) E1x1 1fU TGTAAAACGACGGCCAGTCAGGGTTTCATCCTTTGTGG (SEQ IDNO: 85) E1x1 1fR GCTATGACCATGATTACGCCCAGGGTTTCATCCTTTGTGG (SEQ ID NQ:86) E1x1 1r TGACGGGAAGAGTTCCTCAG (SEQ ID NO: 87) E1x1 1rRGCTATGACCATGATTACGCCTGACGGGAAGAGTTCCTCAG (SEQ ID NO: 88) E1x5 1fTCTGCTCTTCCTGAACTGCC (SEQ ID NO: 89) E1x5 1fUTGTAAAACGACGGCCAGTTCTGCTCTTCCTGAACTGCC (SEQ ID NO: 90) E1x5 1rTTGAGTCCTTCAACAAGCCC (SEQ ID NO: 91) E1x5 1rRGCTATGACCATGATTACGCCTTGAGTCCTTCAACAAGCCC (SEQ ID NO: 92) E1x6 1fUTGTAAAACGACGGCCAGTTTCCCCACTCATAGAGGCTC (SEQ ID NO: 93) E1x6 1rRGCTATGACCATGATTACGCCGCTCCCAACTCGCCAAGT (SEQ ID NO: 94) E1x6a 1fUTGTAAAACGACGGCCAGTGGTCAACATGGAGGCAGC (SEQ ID NO: 95) E1x6a 1rRGCTATGACCATGATTACGCCCAGGTGTCAGTCCGCTTG (SEQ ID NO: 96) E1x6b 1fUTGTAAAACGACGGCCAGTGCAGAGAAGTTCTGAGC (SEQ ID NO: 97) E1x6b 1rRGCTATGACCATGATTACGCCCACTTGGCCAGCCATACTC (SEQ ID NO: 98) E1x6c 1fUTGTAAAACGACGGCCAGTCAAGCAAGCCTCTTGCTACC (SEQ ID NO: 99) E1x6c 1rRGCTATGACCATGATTACGCCACTGCAATGAGGTGAAAGGC (SEQ ID NO: 100) E1x6d 1fUTGTAAAACGACGGCCAGTCAGGTGAGAACAAGTGTCCG (SEQ ID NO: 101) E1x6d 1rRGCTATGACCATGATTACGCCGCTGCCTCCATGTTGACC (SEQ ID NO: 102) E1x6e 1fUTGTAAAACGACGGCCAGTTGTGCCTGGGTGAGATTCT (SEQ ID NO: 103) E1x6e 1rRGCTATGACCATGATTACGCCTGTGGAGCCTCTATGAGTGG (SEQ ID NO: 104) E1x6f 1fUTGTAAAACGACGGCCAGTGGGTGACAGGTGGCAGTAG (SEQ ID NO: 105) E1x6f 1rRGCTATGACCATGATTACGCCGGAAGGAAGGACACTTGAGC (SEQ ID NO: 106) E1x6g 1fUTGTAAAACGACGGCCAGTCCTGGTGTGTTTGAGAACCC (SEQ ID NO: 107) E1x6g 1rRGCTATGACCATGATTACGCCCAATGGGAAGCCAGGCTAG (SEQ ID NO: 108) E1xA 1fATCTTGCTGGCTTAGCCAGT (SEQ ID NO: 109) E1xA 1fUTGTAAAACGACGGCCAGTATCTTGCTGGCTTAGCCAGT (SEQ ID NO: 110) E1xA 1fRGCTATGACCATGATTACGCCATTTTGCTGGCTTAGGCAGT (SEQ ID NO: 111) E1xA 1rGCTCATGCAAATTCGAGAGAG (SEQ ID NO: 112) E1xA 1rRGCTATGACCATGATTACGCCGCTCATGCAAATTCGAGAGAG (SEQ ID NO: 113) E1xB 1fCCTGTGGGTTATTTCCGATGG (SEQ ID NO: 114) E1xB 1fUTGTAAAACGACGGCCAGTCCTGTTGGTTATTTCCGATGG (SEQ ID NO: 115) E1xB 1fRGCTATGACCATGATTACGCCCCTGTTGGTTATTTCCGATGG (SEQ ID NO: 116) E1xB 1rCCTGAGTTAAGAAGGAACGCC (SEQ ID NO: 117) E1xB 1rRGCTATGACCATGATTACGCCCCTGAGTTAAGAAGGAACGCC (SEQ ID NO: 118) E1xC 1fAATTGGTCAGCAGCAATG (SEQ ID NO: 119) E1xC 1fRGCTATGACCATGATTACGCCAATTGGGTCAGCAGCAATG (SEQ ID NO: 120) E1xC 2fAATTGGGTCAGCAGCAATG (SEQ ID NO: 121) E1xC 2fUTGTAAAACGACGGCCAGTAATTGGGTCAGCAGCAATG (SEQ ID NO: 119) E1xC 2fRGCTATGACCATGATTACGCCAATTGGGTCAGCAGCAATG (SEQ ID NO: 122) E1xC 1rTTGGATCGCTAGAGATTGGG (SEQ ID NO: 123) E1xC 1rRGCTATGACCATGATTACGCCTTGGATCGCTAGAGATTGGG (SEQ ID NO: 124) E1xC 2rGCACCCTAATTGGCACTCA (SEQ ID NO: 125) E1xC 2rRGCTATGACCATGATTACGCCGCACCCTAATTGGCACTCA (SEQ ID NO: 126) E1xD 1fTGACGGTCCTCTTCTGGAAC (SEQ ID NO: 127) E1xD 1fRGCTATGACCATGATTACGCCTGACGGTCCTCTTCTGGAAC (SEQ ID NO: 128) E1xD 2fCGAGGCAGGATGTGACTCAT (SEQ ID NO: 129) E1xD 2fUTGTAAAACGACGGCCAGTCGAGGCAGGATGTGACTCAT (SEQ ID NO: 130) E1xD 2fRGCTATGACCATGATTACGCCCGAGGCAGGATGTGACTCAT (SEQ ID NO: 131) E1xD 1rAGTGGATCATTTCGAACGG (SEQ ID NO: 132) E1xD 1rRGCTATGACCATGATTACGCCAGTGGATCATTTCGAACGG (SEQ ID NO: 133) E1xD 2rCCAACTCAGCTTCCCGAGTA (SEQ ID NO: 134) E1xD 2rRGCTATGACCATGATTACGCCCCAACTCAGCTTCCCGAGTA (SEQ ID NO: 135) E1xE 1fTGGCTGAGTATTTCCCTTGC (SEQ ID NO: 136) E1xE 1fUTGTAAAACGACGGCCAGTTGGCTGAGTATTTCCCTTGC (SEQ ID NO: 137) E1xE 1fRGCTATGACCATGATTACGCCTGGCTGAGTATTTCCCTTGC (SEQ ID NO: 138) E1xE 1rTTTAACAAGCCCTCCTCCG (SEQ ID NO: 139) E1xE 1rRGCTATGACCATGATTACGCCTTTAACAAGCCCTCCTCCG (SEQ ID NO: 140) E1xF 1fCAACGCCAGCATCTACTGA (SEQ ID NO: 141) E1xF 1fUTGTAAAACGACGGCCAGTCAACGCCAGCATCTACTGA (SEQ ID NO: 142) E1xF 1fRGCTATGACCATGATTACGCCCAACGCCAGCATCTACTGA (SEQ ID NO: 143) E1xF 1rCAAATAGCAGAGCACAGGCA (SEQ ID NO: 144) E1xF 1rRGCTATGACCATGATTACGCCCAAATAGCAGAGCACAGGCA (SEQ ID NO: 145) E1xG 1fTGAAGTTGCTGCTCTTGGG (SEQ ID NO: 146) E1xG 1fUTGTAAAACGACGGCCAGTTGAAGTTGCTGCTCTTGGG (SEQ ID NO: 147) E1xG 1fRGCTATGACCATGATTACGCCTGAAGTTGCTGCTCTTGGG (SEQ ID NO: 148) E1xG 1rCACTTCCTCCTCATGCAAGTC (SEQ ID NO: 149) E1xG 1rRGCTATGACCATGATTACGCCCACTTCCTCCTCATGCAAGTC (SEQ ID NO: 150) E1xH 1fAGACTGGAGCCTCTGTGTTCG (SEQ ID NO: 151) E1xH 1fUTGTAAAACGACGGCCAGTAGACTGGAGCCTCTGTGTTCG (SEQ ID NO: 152) E1xH 1fRGCTATGACCATGATTACGCCAGACTGGAGCCTCTGTGTTCG (SEQ ID NO: 153) E1xH 1rTGTGTGTCTACCGGACTTGC (SEQ ID NO: 154) E1xH 1rRGCTATGACCATGATTACGCCTGTGTGTCTACCGGACTTGC (SEQ ID NO: 155) E1xH 2rGAACAGAGGCAAGGTTTTCCC (SEQ ID NO: 156) E1xH 2rRGCTATGACCATGATTACGCCGAACAGAGGCAAGGTTTTCCC (SEQ ID NO: 157) E1xI 1fAGAATCGCTTGAACCCAGG (SEQ ID NO: 158) E1xI 1fRGCTATGACCATGATTACGCCAGAATCGCTTGAACCCAGG (SEQ ID NO: 159) E1xI 2fGCTGGTTCCTAAAATGTGGC (SEQ ID NO: 160) E1xI 2fUTGTAAAACGACGGCCAGTGCTGGTTCCTAAAATGTGGC (SEQ ID NO: 161) E1xI 2fRGCTATGACCATGATTACGCCGCTGGTTCCTAAAATGTGGC (SEQ ID NO: 162) E1xI 1rCATACGAGGTGAACACAAGGAC (SEQ ID NO: 163) E1xI 1rRGCTATGACCATGATTACGCCCATACGAGGTGAACACAAGGAC (SEQ ID NO; 164) E1xJ 1fTGAAGAGGTGGGGACAGTTG (SEQ ID NO: 165) E1xJ 1fRGCTATGACCATGATTACGCCTGAAGAGGTGGGGACAGTTG (SEQ ID NO: 166) E1xJ 2fCTTGTGCCTTCCAGCTACATC (SEQ ID NO: 167) E1xJ 2fUTGTAAAACGACGGCCAGTCTTGTGCCTTCCAGCTACATC (SEQ ID NO: 168) E1xJ 2fRGCTATGACCATGATTACGCCCTTGTGCCTTCCAGCTACATC (SEQ ID NO: 169) E1xJ 1rAGTCCTGGCACAGGGATTAG (SEQ ID NO: 170) E1xJ 1rRGCTATGACCATGATTACGCCAGTCCTGGCACAGGGATTAG (SEQ ID NO: 171) E1xJ 2rATAACTGCAGCAAAGGCACC (SEQ ID NO: 172) E1xJ 2rRGCTATGACCATGATTACGCCATAACTGCAGCAAAGGCACC (SEQ ID NO: 173) E1xK 1fGCTTCAGTGGATCTTGCTGG (SEQ ID NO: 174) E1xK 1fUTGTAAAACGACGGCCAGTGCTTCAGTGGATCTTGCTGG (SEQ ID NO: 175) E1xK 1fRGCTATGACCATGATTACGCCGCTTCAGTGGATCTTGCTGG (SEQ ID NO: 176) E1xK 1rTGTGCAGTGCACAACCTACC (SEQ ID NO: 177) E1xK 1rRGCTATGACCATGATTACGCCTGTGCAGTGCACAACCTACC (SEQ ID NO: 178) E1xL 1fGTTGTCGAGTGGCGTGCTAT (SEQ ID NO: 179) E1xL 1fUTGTAAAACGACGGCCAGTGTTGTCGAGTGGCGTGCTAT (SEQ ID NO: 180) E1xL 1fRGCTATGACCATGATTACGCCGTTGTCGAGTGGCGTGCTAT (SEQ ID NO: 181) E1xL 1rAAAAGTCCTGTGGGGTCTGA (SEQ ID NO: 182) E1xL 1rRGCTATGACCATGATTACGCCAAAAGTCCTGTGGGGTCTGA (SEQ ID NO: 183) E1xM 1fAGAAGTGTGGCCTCTGCTGT (SEQ ID NO: 184) E1xM 1fUTGTAAAACGACGGCCAGTAGAAGTGTGGCCTCTGCTGT (SEQ ID NO: 185) E1xM 1fRGCTATGACCATGATTACGCCAGAAGTGTGGCCTCTGCTGT (SEQ ID NO: 186) E1xM 1rGTGAAAGAGCCTGTGTTTGCT (SBQ ID NO: 187) E1xM 1rRGCTATGACCATGATTACGCCGTGAAAGAGCCTGTGTTTGCT (SEQ ID NO: 188) E1xN 1fAGACCCTGCTTCCAAATAAGC (SEQ ID NO: 189) E1xN 1fUTGTAAAACGACGGCCAGTAGACCCTGCTTCCAAATAAGC (SEQ ID NO: 190) E1xN 1fRGCTATGACCATGATTACGCCAGACCCTGCTTCCAAATAAGC (SEQ ID NO: 191) E1xN 1rACTCATTTTCTGCCTCTGCC (SEQ ID NO: 192) E1xN 1rRGCTATGACCATGATTACGCCACTCATTTTCTGCCTCTGCC (SEQ ID NO: 193) E1xO 1fTGGCAGTCCTGTCAACCTCT (SEQ ID NO: 194) E1xO 1fUTGTAAAACGACGGCCAGTTGGCAGTCCTGTCAACCTCT (SEQ ID NO: 195) E1xO 1fRGCTATGACCATGATTACGCCTGGCAGTCCTGTCAACCTCT (SEQ ID NO: 196) E1xO 1rCACACAGGATCTTGCACTGG (SEQ ID NO: 197) E1xO 1rRGCTATGACCATGATTACGCCCACACAGGATCTTGCACTGG (SEQ ID NO: 198) E1xP 1fAGGGCCAGTTCTCATGAGTT (SEQ ID NO: 199) E1xP 1fUTGTAAAACGACGGCCAGTAGGGCCAGTTCTCATGAGTT (SEQ ID NO: 200) E1xP 1fRGCTATGACCATGATTACGCCAGGGCCAGTTCTCATGAGTT (SEQ ID NO: 201) E1xP 1rGGGCAAAGGAAGACACAATC (SEQ ID NO: 202) E1xP 1rRGCTATGACCATGATTACGCCGGGCAAAGGAAGACACAATC (SEQ ID NO: 203) E1xQ 1fCAACTTCTGCTTTGAAGCCC (SEQ ID NO: 204) E1xQ 1fUTGTAAAACGACGGCCAGTCAACTTCTGCTTTGAAGCCC (SEQ ID NO: 205) E1xQ 1fRGCTATGACCATGATTACGCCCAACTTCTGCTTTGAAGCCC (SEQ ID NO: 206) E1xQ 1rGACAGACTTGGCAATCTCCC (SEQ ID NO: 207) E1xQ 1rRGCTATGACCATGATTACGCCGACAGACTTGGCAATCTCCC (SEQ ID NO: 208) E1xR 1fTCTGCTCTCTGTTTGGAGTCC (SEQ ID NO: 209) E1xR 1fUTGTAAAACGACGGCCAGTTCTGCTCTCTGTTTGAGTCC (SEQ ID NO: 210) E1xR 1fRGCTATGACCATGATTACGCCTCTGCTCTCTGTTTGGAGTCC (SEQ ID NO: 211) E1xR 1rCCCTAAACTCCACGTTCCTG (SEQ ID NO: 212) E1xR 1rRGCTATGACCATGATTACGCCCCCTAAACTCCACGTTCCTG (SEQ ID NO: 213) E1xS 1fGGGTTAATGTTGGCCACATC (SEQ ID NO: 214) E1xS 1fRGCTATGACCATGATTACGCCGGGTTAATGTTGGCCACATC (SEQ ID NO: 215) E1xS 2fTTGGCAGGGATGTGTTGAG (SEQ ID NO: 216) E1xS 2fUTGTAAAACGACGGCCAGTTTGGCAGGGATGTGTTGAG (SEQ ID NO: 217) E1xS 2fRGCTATGACCATGATTACGCCTTGGCAGGGATGTGTTGAG (SEQ ID NO: 218) E1xS 1rGTCTGCCACATGTCAAGAG (SEQ ID NO: 219) E1xS 1rRGCTATGACCATGATTACGCCGTCTGCCACATGTGCAAGAG (SEQ ID NO: 220) E1xT 1fTGGTCTGAGTCTCGTGGGTA (SEQ ID NO: 221) E1xT 1fUTGTAAAACGACGGCCAGTTGGTCTGAGTCTCGTGGGTA (SEQ ID NO: 222) E1xT 1fRGCTATGACCATGATTACGCCTGGTCTGAGTCTCGTGGGTA (SEQ ID NO: 223) E1xT 1rGAGGTGGATTTGGGTGAGATT (SEQ ID NO: 224) E1xT 1rRGCTATGACCATGATTACGCCGAGGTGGATTTGGGTGAGATT (SEQ ID NO: 225) E1xU 1fAGCCCTCTCTGCAAGGAAAG (SEQ ID NO: 226) E1xU 1fUTGTAAAACGACGGCCAGTAGCCCTCTCTGCAAGGAAAG (SEQ ID NO: 227) E1xU 1fRGCTATGACCATGATTACGCCAGCCCTCTCTGCAAGGAAAG (SEQ ID NO: 228) E1xU 1rCAGAACGTGGAGTTCTGCTG (SEQ ID NO: 229) E1xU 1rRGCTATGACCATGATTACGCCCAGAACGTGGAGTTCTGCTG (SEQ ID NO: 230) E1xV 1fTACCGAATCCCACTCCTCTG (SEQ ID NO: 231) E1xV 1fUTGTAAAACGACGGCCAGTTACCGAATCCCACTCCTCTG (SEQ ID NO: 232) E1xV 1fRGCTATGACCATGATTACGCCTACCGAATCCCACTCCTCTG (SEQ ID NO: 233) E1xV 2fCATGGTAGAGGTGGGACCAT (SEQ ID NO: 234) E1xV 2fUTGTAAAACGACGGCCAGTCATGGTAGAGGTGGGACCAT (SEQ ID NO: 235) E1xV 2fRGCTATGACCATGATTACGCCCATGGTAGAGGTGGGACCAT (SEQ ID NO: 236) E1xV 1rGATATCCACCTCTGCCCAAG (SEQ ID NO: 237) E1xV 1rRGCTATGACCATGATTACGCCGATATCCACCTCTGCCCAAG (SEQ ID NO: 238) E1xV 2rTTACAGGGGCACAGAGAAGC (SEQ ID NO: 239) E1xV 2rRGCTATGACCATGATTACGCCTTACAGGGGCACAGAGAAGC (SEQ ID NO: 240) 57-1 1fGCAACAGAGCAAGACCCTGT (SEQ ID NO: 241) 57-1 1fRGCTATGACCATGATTACGCCGCAACAGAGCAAGACCCTGT (SEQ ID NO: 242) 57-1 1rAAATTAGCCAGGCATGGTG (SEQ ID NO: 243) 57-1 1rRGCTATGACCATGATTACGCCAAATTAGCCAGGCATGGTG (SEQ ID NO: 244) 57-1 1fUTGTAAAACGACGGCCAGTGCAACAGAGCAAGACCCTGT (SEQ ID NO: 245) 57-2 1fCCTGCAGAAGGAAACCTGAC (SEQ ID NO: 246) 57-2 1fRGCTATGACCATGATTACGCCCCTGCAGAAGGAAACCTGAC (SEQ ID NO: 247) 57-2 1rCTGCATCTTTGCCACCATG (SEQ ID NO: 248) 57-2 1rRGCTATGACCATGATTACGCCCTGCATCTTTGCCACCATG (SEQ ID NO: 249) 57-2 1fUTGTAAAACGACGGCCAGTCCTGCAGAAGGAAACCTGAC (SEQ ID NO: 250) 57-3 1fTTCCCAGGAGGCAAGTTATG (SEQ ID NO: 251) 57-3 1fRGCTATGACCATGATTACGCCTTCCCAGGAGGCAAGTTATG (SEQ ID NO: 252) 57-3 1rTGGGCTTAGGTGATCCTCAC (SEQ ID NO: 253) 57-3 1rRGCTATGACCATGATTACGCCTGGGCTTAGGTGATCCTCAC (SEQ ID NO: 254) 57-3 1fUTGTAAAACGACGGCCAGTTTCCCAGGAGGCAAGTTATG (SEQ ID NO: 255) 57-4 1fACCAAGCCCAACTAATCAGC (SEQ ID NO: 256) 57-4 1fRGCTATGACCATGATTACGCCACCAAGCCCAACTAATCAGC (SEQ ID NO: 257) 57-4 1rATGCCTGTAATCCCAGCACT (SEQ ID NO: 258) 57-4 1rRGCTATGACCATGATTACGCCATGCCTGTAATCCCAGCACT (SEQ ID NO: 259) 57-4 1fUTGTAAAACGACGGCCAGTACCAAGCCCAACTAATCAGC (SEQ ID NO: 260) 57-5 1fACTGCAAGCCCTCTCTGAAC (SEQ ID NO: 261) 57-5 1r CGAAGACTGCGAAACAGACA (SBQID NO: 262) 58-1 1f CTAGTGCCGTGCAGAATGAG (SEQ ID NO: 263) 58-1 1rGGCCACTGCAATGAGATACA (SEQ ID NO: 264) 58-2 1f GAGAAACAGTTCCAGGGTGG (SEQID NO: 265) 58-2 1fR GCTATGACCATGATTACGCCGAGAAACAGTTCCAGGGTGG (SEQ IDNO: 266) 58-2 1r AAACTGAGGCTGGGAGAGGT (SEQ ID NO: 267) 58-2 1rRGCTATGACCATGATTACGCCAAACTGAGGCTGGGAGAGGT (SEQ ID NO: 268) 58-3 1fTGTTCTTCCTCACAGGGAGG (SEQ ID NO: 269) 58-3 1fRGCTATGACCATGATTACGCCTGTTCTTCCTCACAGGGAGG (SEQ ID NO: 270) 58-3 1rTCCCCAAATCTGTCCAGTTC (SEQ ID NO: 271) 58-3 1rRGCTATGACCATGATTACGCCTCCCCAAATCTGTCCAGTTC (SEQ ID NO: 272) 58-4 1fCATACCTGGAGGGATGCTTG (SEQ ID NO: 273) 58-4 1fRGCTATGACCATGATTACGCCCATACCTGGAGGGATGCTTG (SEQ ID NO: 274) 58-4 1rTAGGTTGCTGTGTGGCTTCA (SEQ ID NO: 275) 58-4 1rRGCTATGACCATGATTACGCCTAGGTTGCTGTGTGGCTTCA (SEQ ID NO: 276) 58-5 1fCTTCTGACAAAGCAGAGGCC (SEQ ID NO: 277) 58-5 1fRGCTATGACCATGATTACGCCCTTCTGACAAAGCAGAGGCC (SEQ ID NO: 278) 58-5 1rGCTGTTAGGGTTACCATCGC (SEQ ID NO: 279) 58-5 1rRGCTATGACCATGATTACGCCGCTGTTAGGGTTACCATCGC (SEQ ID NO: 280) 58-6 1fCCACAGGGTGATATGCTGTC (SEQ ID NO: 281) 58-6 1fRGCTATGACCATGATTACGCCCCACAGGGTGATATGCTGTC (SEQ ID NO: 282) 58-6 1rCGCCTGGCTACTTTGGTACT (SEQ ID NO: 283) 58-6 1rRGCTATGACCATGATTACGCCCGCCTGGCTACTTTGGTACT (SEQ ID NO: 284) 58-7 1fCCAAATGAACCTGGGCAAC (SEQ ID NO: 285) 58-7 1fRGCTATGACCATGATTACGCCCCAAATGAACCTGGGCAAC (SEQ ID NO: 286) 58-7 1rGTCTTGGCTCACTGCAACCT (SEQ ID NO: 287) 58-7 1rRGCTATGACCATGATTACGCCGTCTTGGCTCACTGCAACCT (SEQ ID NO: 288) 58-8 1fGCCAAGACTGTGCTACTGCA (SEQ ID NO: 289) 58-8 1r CAGGGAGCAGATCTTACCCA (SEQID NO: 290) 58-9 1f TGGGATTAACTAGGGAGGGG (SEQ ID NO: 291) 58-9 1fRGCTATGACCATGATTACGCCTGGGATTAACTAGGGAGGGG (SEQ ID NO: 292) 58-9 1rTGCTGCTGTCTCCATCTCTG (SEQ ID NO: 293) 58-9 1rRGCTATGACCATGATTACGCCTGCTGCTGTCTCCATCTCTG (SEQ ID NO: 294) 58-10 1fACAGACCAGCAGTGAAACCTG (SEQ ID NO: 295) 58-10 1fRGCTATGACCATGATTACGCCACAGACCAGCAGTGAAACCTG (SEQ ID NO: 296) 58-10 1rGTTCACTGCAACCTCTGCCT (SEQ ID NO: 297) 58-10 1rRGCTATGACCATGATTACGCCGTTCACTGCAACCTCTGCCT (SEQ ID NO: 298) 58-11 1fGTTCTCGTAGATGCTTGCAGG (SEQ ID NO: 299) 58-11 1fRGCTATGACCATGATTACGCCGTTCTCGTAGATGCTTGCAGG (SEQ ID NO: 300) 58-11 1rGAGGCAGGAGGATCACTTGA (SEQ ID NO: 301) 58-11 1rRGCTATGACCATGATTACGCCGAGGCAGGAGGATCACTTGA (SEQ ID NO: 302) 58-12 1fTGAGCTGAGATCACACCGCT (SEQ ID NO: 303) 58-12 1fRGCTATGACCATGATTACGCCTGAGCTGAGATCACACCGCT (SEQ ID NO: 304) 58-12 1rAGTTGACACTTTGCTGGCCT (SEQ ID NO: 305) 58-12 1rRGCTATGACCATGATTACGCCAGTTGACACTTTGCTGGCCT (SEQ ID NO: 306) 58-13 1fCTCTGCATGGCTTAGGGACA (SEQ ID NO: 307) 58-13 1fRGCTATGACCATGATTACGCCCTCTGCATGGCTTAGGGACA (SEQ ID NO: 308) 58-13 1rGGCTGCTCTCTGCATTCTCT (SEQ ID NO: 309) 58-13 1rRGCTATGACCATGATTACGCCGGCTGCTCTCGCATTCTCT (SEQ ID NO: 310) 58-14 1fCTGGCTTTAGCTGCATTTCC (SEQ ID NO: 311) 58-14 1fRGCTATGACCATGATTACGCCCTGGCTTTAGCTTGCATTTCC (SEQ ID NO: 312) 58-14 1rTGCCTCAGTTTTCTCACCTGT (SEQ ID NO: 313) 58-14 1rRGCTATGACCATGATTACGCCTGCCTCAGTTTTCTCACCTGT (SEQ ID NO: 314) 58-15 1fCAAACAGCCACTGAGCATGT (SEQ ID NO: 315) 58-15 1fRGCTATGACCATGATTACGCCCAAACAGCCACTGAGCATGT (SEQ ID NO: 316) 58-15 1rTCCTCCTGTAGATGCCCAAG (SEQ ID NO: 317) 58-15 1rRGCTATGACCATGATTACGCCTCCTCCTGTAGATGCCCAAG

TABLE 5 LRP-5 exon SNPs Exon Polymorphism Amino Acid Change Locationexon E G to A Intronic 10 bP 3^(r) of exon E exon E C to T none Phe³³¹,exon E exon F G to A Intronic 50 bp 5′ of exon F exon G C to T nonePhe⁵¹⁸, exon G exon I C to T none Asn⁷⁰⁹, exon I exon P C to T Intronic82 bp 5′ of exon P exon N C to T none Asp¹⁰⁶⁸, exon N exon N A to G noneVal¹⁰⁸⁸, exon N exon Q C to T Ala¹²⁹⁹ to Val Ala¹²⁹⁹, exon Q exon U T toC Val¹⁴⁹⁴ to Ala Val¹⁴⁹⁴, exon U

TABLE 6 SNP's Identified in the IDDM 4 Locus List of PCR Fragments andavailable RFLP Sites for Analysis: PCR Product SNP Location EnzymeContig 57 57-1 a/t 13363 none 57-1 a/g 13484 Bst XI 57-2 a/g 14490 none57-2 a/g 14885 none 57-3 c/g 18776 Mae II 57-3 t/c 18901 Msp I 57-3 a/g19313 Afl II 57-4 22T/25T 20800 none 57-5 g/a 23713 Msp I Contig 5858-15 c/t 3015 none 58-14 g/c 3897 Pfl MI 58-13 c/g 5574 Eco NI 58-12t/g 6051 none 58-11 a/g 8168 none 58-10 a/g 8797 none 58-9 g/t 9445 none58-9 c/t 9718 none 58-8 insert T 10926 Pst I 58-7 t/a 11449 Bst XI 58-7t/c 11468 none 58-6 t/c 11878 none 58-6 g/a 12057 none 58-6 a/g 12180Hga I 58-5 c/t 14073 none 58-4 a/g 15044 Mae II 58-4 t/c 15354 none 58-3insert G 16325 none 58-2 g/a 17662 none 58-1 g/t 18439 Bgl II

TABLE 7 SNP primers (SEQ ID NO:240) 57-1 1f GCAACAGAGCAAGACCCTGT (SEQ IDNO:241) 57-1 1fR GCTATGACCATGATTACGCCGCAACAGAGCAAGACCCTGT (SEQ IDNO:242) 57-1 1r AAATTAGCCAGGCATGGTG (SEQ ID NO:243) 57-1 1rRGCTATGACCATGATTACGCCAAATTAGCCAGGCATGGTG (SEQ ID NO:244) 57-1 1fUTGTAAAACGACGGCCAGTGCAACAGAGCAAGACCCTGT (SEQ ID NO:245) 57-2 1fCCTGCAGAAGGAAACCTGAC (SEQ ID NO:246) 57-2 1fRGCTATGACCATGATTACGCCCCTGCAGAAGGAAACCTGAC (SEQ ID NO:247) 57-2 1rCTGCATCTTTGCCACCATG (SEQ ID NO:248) 57-2 1rRGCTATGACCATGATTACGCCCTGCATCTTTGCCACCATG (SEQ ID NO:249) 57-2 1fUTGTAAAACGACGGCCAGTCCTGCAGAAGGAAACCTGAC (SEQ ID NO:250) 57-3 1fTTCCCAGGAGGCAAGTTATG (SEQ ID NO:251) 57-3 1fRGCTATGACCATGATTACGCCTTCCCAGGAGGCAAGTTATG (SEQ ID NO:252) 57-3 1rTGGGCTTAGGTGATCCTCAC (SEQ ID NO:253) 57-3 1rRGCTATGACCATGATTACGCCTGGGCTTAGGTGATCCTCAC (SEQ ID NO:254) 57-3 1fUTGTAAAACGACGGCCAGTTTCCCAGGAGGCAAGTTATG (SEQ ID NO:255) 57-4 1fACCAAGCCCAACTAATCAGC (SEQ ID NO:256) 57-4 1fRGCTATGACCATGATTACGCCACCAAGCCCAACTAATCAGC (SEQ ID NO:257) 57-4 1rATGCCTGTAATCCCAGCACT (SEQ ID NO:258) 57-4 1rRGCTATGACCATGATTACGCCATGCCTGTAATCCCAGCACT (SEQ ID NO:259) 57-4 1fUTGTAAAACGACGGCCAGTACCAAGCCCAACTAATCAGC (SEQ ID NO:260) 57-5 1fACTGCAAGCCCTCTCTGAAC (SEQ ID NO:261) 57-5 1r CGAAGACTGCGAAACAGACA (SEQID NO:262) 58-1 1f CTAGTGCCGTGCAGAATGAG (SEQ ID NO:263) 58-1 1rGGCCACTGCAATGAGATACA (SEQ ID NO:264) 58-2 1f GAGAAACAGTTCCAGGGTGG (SEQID NO:265) 58-2 1fR GCTATGACCATGATTACGCCGAGAAACAGTTCCAGGGTGG (SEQ IDNO:266) 58-2 1r AAACTGAGGCTGGGAGAGGT (SEQ ID NO:267) 58-2 1rRGCTATGACCATGATTACGCCAAACTGAGGCGGAGAGGT (SEQ ID NO:268) 58-3 1fTGTTCTTCCTCACAGGGAGG (SEQ ID NO:269) 58-3 1fRGCTATGACCATGATTACGCCTGTTCTTCCTCACAGGGAGG (SEQ ID NO:270) 58-3 1rTCCCCAAATCTGTCCAGTTC (SEQ ID NO:271) 58-3 1rRGCTATGACCATGATTACGCCTCCCCAAATCTGTCCAGTTC (SEQ ID NO:272) 58-4 1fCATACCTGGAGGGATGCTTG (SEQ ID NO:273) 58-4 1fRGCTATGACCATGATTACGCCCATACCTGGAGGGATGCTTG (SEQ ID NO:274) 58-4 1rTAGGTTGCTGTGTGGCTTCA (SEQ ID NO:275) 58-4 1rRGCTATGACCATGATTACGCCTAGGTTGCTGTGTGGCTTCA (SEQ ID NO:276) 58-5 1fCTTCTGACAAAGCAGAGGCC (SEQ ID NO:277) 58-5 1fRGCTATGACCATGATTACGCCCTTCTGACAAAGCAGAGGCC (SEQ ID NO:278) 58-5 1rGCTGTTAGGGTTACCATCGC (SEQ ID NO:279) 58-5 1rRGCTATGACCATGATTACGCCGCTGTTAGGGTTACCATCGC (SEQ ID NO:280) 58-6 1fCCACAGGGTGATATGCTGTC (SEQ ID NO:281) 58-6 1fRGCTATGACCATGATTACGCCCCACAGGGTGATATGCTGTC (SEQ ID NO:282) 58-6 1rCGCCTGGCTACTTTGGTACT (SEQ ID NO:283) 58-6 1rRGCTATGACCATGATTACGCCCGCCTGGCTACTTTGGTACT (SEQ ID NO:284) 58-7 1fCCAAATGAACCTGGGCAAC (SEQ ID NO:285) 58-7 1fRGCTATGACCATGATTACGCCCCAAATGAACCTGGGCAAC (SEQ ID NO:286) 58-7 1rGTCTTGGCTCACTGCAACCT (SEQ ID NO:287) 58-7 1rRGCTATGACCATGATTACGCCGTCTTGGCACTGCAACCT (SEQ ID NO:288) 58-8 1fGCCAAGACTGTGCTACTGCA (SEQ ID NO:289) 58-8 1r CAGGGAGCAGATCTTACCCA (SEQID NO:290) 58-9 1f TGGGATTAACTAGGGAGGGG (SEQ ID NO:291) 58-9 1fRGCTATGACCATGATTACGCCTGGGATTAACTAGGGAGGGG (SEQ ID NO:292) 58-9 1rTGCTGCTGTCTCCATCTCTG (SEQ ID NO:293) 58-9 1rRGCTATGACCATGATTACGCCTGCTGCTGTCTCCATCTCTG (SEQ ID NO:294) 58-10 1fACAGACCAGCAGTGAAACCGT (SEQ ID NO:295) 58-10 1fRGCTATGACCATGATTACGCCACAGACCAGCAGTGAAACCTG (SEQ ID NO:296) 58-10 1rGTTCACTGCAACCTCTGCCT (SEQ ID NO:297) 58-10 1rRGCTATGACCATGATTACGCCGTTCACTGCAACCTCTGCCT (SEQ ID NO:298) 58-11 1fGTTCTCGTAGATGCTTGCAGG (SEQ ID NO:299) 58-11 1fRGCTATGACCATGATTACGCCGTTCTCGTAGATGCTTGCAGG (SEQ ID NO:300) 58-11 1rGAGGCAGGAGGATCACTTGA (SEQ ID NO:301) 58-11 1rRGCTATGACCATGATTACGCCGAGGCAGGAGGATCACTTGA (SEQ ID NO:302) 58-12 1fTGAGCTGAGATCACACCGCT (SEQ ID NO:303) 58-12 1fRGCTATGACCATGATTACGCCTGAGCTGAGATCACACCGCT (SEQ ID NO:304) 58-12 1rAGTTGACACTTTGCTGGCCT (SEQ ID NO:305) 58-12 1rRGCTATGACCATGATTACGCCAGTTGACACTTTGCTGGCCT (SEQ ID NO:306) 58-13 1fCTCTGCATGGCTTAGGGACA (SEQ ID NO:307) 58-13 1fRGCTATGACCATGATTACGCCCTCTGCATGGCTTAGGGACA (SEQ ID NO:308) 58-13 1rGGCTGCTCTCTGCATTCTCT (SEQ ID NO:309) 58-13 1rRGCTATGACCATGATTACGCCGGCTGCTCTCTGCATTCTCT (SEQ ID NO:310) 58-14 1fCTGGCTTTAGCTTGCATTTCC (SEQ ID NO:311) 58-14 1fRGCTATGACCATGATTACGCCCTGGCTTTAGCTTGCATTTCC (SEQ ID NO:312) 58-14 1rTGCCTCAGTTTTCTCACCGT (SEQ ID NO:313) 58-14 1rRGCTATGACCATGATTACGCCTGCCTCAGTTTTCTCACCTGT (SEQ ID NO:314) 58-15 1fCAAACAGCCACTGAGCATGT (SEQ ID NO:315) 58-15 1fRGCTATGACCATGATTACGCCCAAACAGCCACTGAGCATGT (SEQ ID NO:316) 58-15 1rTCCTCCTGTAGATCCCCAAG (SEQ ID NO:317) 58-15 1rRGCTATGACCATGATTACGCCTCCTCCTGTAGATGCCCAAG

TABLE 8 Primers designed by microsatellite rescue for genotyping andrestriction mapping of the IDDM4 region on chromosome 11q13. The otherprimers used are published, and are also in the Genome Database. 255CA3FGCCGAGAATTGTCATCTTAACT (SEQ ID NO:318) 255CA3R GGATTGAAAGCTGCAAACTACA(SEQ ID NO:319) 255CA5F GGAGCCACCACATCCAGTTA (SEQ ID NO:320) 255CA5RTGGAGGGATTGCTTGAGG (SEQ ID NO:321) 255CA6F AGGTGTACACCACCATGCCT (SEQ IDNO:322) 255CA6R TGGTGCCAATTATTGCTGC (SEQ ID NO:323) 14LCASFAGATCTTATACACATGTGCGCG (SEQ ID NO:324) 14LCA5R AGGTGACATCACTTACAGCGG(SEQ ID NO:325) L15CA1F ATTACCCAGGCATGGTGC (SEQ ID NO:326) L15CA1RCAGGCACTTCTTCCAGGTCT (SEQ ID NO:327) 18018ACF AGGGTTACACTGGAGTTTGC (SEQID NO:328) 18018ACR AAACCTTCAATGTGTTCATTAAAAC (SEQ ID NO:329) E0864CAFTCAACTTTATTGGGGGTTTA (SEQ ID NO:330) E0864CAR AAGGTAAAAGTCCAAAATGG (SEQID NO:331) H0570POLYAF GGACAGTCAGTTATTGAAATG (SEQ ID NO:332) H0560POLYARTTTCCTCTCTGGGAGTCTCT (SEQ ID NO:333) E0864CA was obtained from thecosmid E0864 H0570POLYA was obtained from the cosmid H0570 255CA5,255CA3 and 255CA6 were obtained from the PAC255_m_19 14LCA5 and L15CA1were obtained from the BAC 14_1_15 18018AC was obtained from the PAC18_o_18

TABLE 9 PCR Primers for obtaining LRP-3 cDNA A.) Primers located withinhumanLRP-3 cDNA: The primers are numbered beginning at nucleotide 1 inFIG. 17(a) 1F (muex 1f) ATGGAGCCCGAGTGAGC (SEQ ID NO: 49) 200fTCAAGCTGGAGTCCACCATC (SEQ ID NO: 334) 218R (27R) ATGGTGGACTCCAGCTTGAC(SEQ ID NO: 50) 256F (1F) TTCCAGTTTTCCAAGGGAG (SEQ ID NO: 51) 265R (26R)AAAACTGGAAGTCCACTGCG (SEQ ID NO: 52) 318R (4R) GGTCTGCTTGATGGCCTC (SEQID NO: 53) 343F (2F) GTGCAGAACGTGGTCATCT (SEQ ID NO: 54) 361R (21R)GTGCAGAACGTGGTCATCT (SEQ ID NO: 54) 622R (2R) AGTCCACAATGATCTTCCGG (SEQID NO: 55) 638F (4F) CCAATGGACTGACCATCGAC (SEQ ID NO: 56) 657R (1R)GTCGATGGTCAGTCCATTGG (SEQ ID NO: 57) 936f CACTCGCTGTGAGGAGGAC (SEQ IDNO: 335) 956R (22R) TTGTCCTCCTCACAGCGAG (SEQ ID NO: 58) 1040f(51f)ACAACGGCAGGACGTGTAAG (SEQ ID NO: 336) 1174f (40f) ATTGCCATCGACTACGACC(SEQ ID NO: 337) 1277f (52f) TGGTCAACACCGAGATCAAC (SEQ ID NO: 338) 1333fAACCTCTACTGGACCGACAC (SEQ ID NO: 339) 1462f (41f) CTCATGTACTGGACAGACT(SEQ ID NO: 340) 1481R (23R) CAGTCTGTCCAGTACATGAG (SEQ ID NO: 60) 1607f(50f) GAGACGCCAAGACAGACAAG (SEQ ID NO: 341) 1713F (21F)GGACTTCATCTACTGGACTG (SEQ ID NO: 59) 1732r (40r) CAGTCCAGTAGATGAAGTCC(SEQ ID NO: 342) 1904r (k275r) GTGAAGAAGCACAGGTGGCT (SEQ ID NO: 343)1960r TCATGTCACTCAGCAGCTCC (SEQ ID NO: 344) 1981F (22F)GCCTTCTTGGTCTTCACCAG (SEQ ID NO: 61) 2261F (23F) GGACCAACAGAATCGAAGTG(SEQ ID NO: 62) 2484R (5R) GTCAATGGTGAGGTCGT (SEQ ID NO: 63) 2519F (5F)ACACCAACATGATCGAGTCG (SEQ ID NO: 64) 2780r CCGTTGTTGTGCATACAGTC (SEQ IDNO: 345) 3011F (24F) ACAAGTTCATCTACTGGGTG (SEQ ID NO: 65) 3154F (25F)CGGACACTGTTCTGGACGTG (SEQ ID NO: 66) 3173R (25R) CACGTCCAGAACAGTGTCCG(SEQ ID NO: 67) 3556R (3R) TCCAGTAGAGATGCTTGCCA (SEQ ID NO: 68) 3577F(3F) ATCGAGCGTGTGGAGAAGAC (SEQ ID NO: 69) 3851r GTGGCACATGCAAACTGGTC(SEQ ID NO: 346) 4094F (30F) TCCTCATCAAACAGCAGTGC (SEQ ID NO: 70) 4173R(6R) CGGCTTGGTGATTTCACAC (SEQ ID NO: 71) 4687F (6F)GTGTGTGACAGCGACTACAGC (SEQ ID NO: 72) 4707R (30R) GCTGTAGTCGCTGTCACACAC(SEQ ID NO: 73) 5061R (7R) GTACAAAGTTCTCCCAGCCC (SEQ ID NO: 74) 3′ endwith Xbal site 5069r GCTCTAGAGTACAAAGTTCTCCCAGCCC (SEQ ID NO: 347)Soluble/HSV/His primers HLRP3_His_primer1 (4203r)ATCCTCGGGGTCTTCCGGGGCGAGTTCTGGCTGGCTACTGCTGTGGGCCGGGCT (SEQ ID NO: 348)HLRP3_His_primer2 TGGATATCTCAGTGGTGGTGGTGGTGGTGCTCGACATCCTCGGGGTCTTCCGGG (SEQ ID NO: 349) HLRP3_5′_primer (49f)TAGAATTCGCCGCCACCATGGAGGCAGCGCCGCCC (SEQ ID NO: 350) B.) Mouse Lrp-3cDNA primers. The primers are numbered beginning at nucleotide 1 in FIG.18(a). 13f (mulrp3 5f) GAGGCGGGAGCAAGAGG (SEQ ID NO: 351) 68f (MucD 1f)GC Hind 3 CATGGAGCCCGAGTGAGC (SEQ ID NO: 352) 69f (muex 1f)ATGGAGCCCGAGTGAGC (SEQ ID NO: 353) 83r (muex 1r) TCACTCGGGCTCCATGG (SEQID NO: 354) 171f (MucD 2f) TGCTGTACTGCAGCTTGGTC (SEQ ID NO: 355) 300f(MucD 10F) ATGCAGCTGCTGTAGACTTCC (SEQ ID NO: 356) 378r (mulrp3 3r)GTCTGTTTGATGGCCTCCTC (SEQ ID NO: 357) 414r (MucD 7R)ATGTTCTGTGCAGCACCTCC (SEQ ID NO: 358) 445r (mulrp3 4r)GCCATCAGGTGACACGAG (SEQ ID NO: 359) 536f (MucD 11F)AAGGTTCTCTTCTGGCAGGAC (SEQ ID NO: 360) 619r (MucD 12R)CCAGTCAGTCCAGTACATG (SEQ ID NO: 361) 714f (museq 1f)TCGACCTGGAGGAACAGAAG (SEQ ID NO: 362) 752f (mulrpAb 1f)AAGCTCAGCTTCATCCACCG (SEQ ID NO: 363) 765r (MucD 8R)ATGAAGCTGAGCTTGGCATC (SEQ ID NO: 364) 915f (MucD 12F)AGCAGAGGAAGGAGATCCTTAG (SEQ ID NO: 365) 957r (MucD 9R)TCCATGGGTGAGTACAGAGC (SEQ ID NO: 366) 1105r (museq 1r)ATTGTCCTGCAACTGCACAC (SEQ ID NO: 367) 1232f (MucD 13F)GCCATTGCCATTGACTACG (SEQ ID NO: 368) 1254r (MucD 10R)GGATCGTAGTCAATGGCAATG (SEQ ID NO: 369) 1425f (MucD 14F)GAATTGAGGTGACTCGCCTC (SEQ ID NO: 370) 1433r (MucD 18R)CCTCAATTCTGTAGTGCCTG (SEQ ID NO: 371) 1501f (muxt 4f)TGTGTTGCACCCTGTGATG (SEQ ID NO: 372) 1579r (MucD 11R)ATCTAGGTTGGCGCATTCG (SEQ ID NO: 373) 1610r (MucD 13R)AGGTGTTCACCAGGACATG (SEQ ID NO: 374) 1710r (mulrpAb 1r)GCGAGCTCCCGTCTATGTTGATCACCTCG (SEQ ID NO: 375) 1868f (MucD 3f)GACCTGATGGGACTCAAAGC (SEQ ID NO: 376) 2062r (MucD 2r)GCTGGTGAATACCAGGAAGG (SEQ ID NO: 377) 2103f (MucD 4f)ACGATGTGGCTATCCCACTC (SEQ ID NO: 378) 2422r (MucD 14R)AGTAGGATCCAGAGCCAGAG (SEQ ID NO: 379) 2619f (MucD 5f)AGCGCATGGTGATAGCTGAC (SEQ ID NO: 380) 2718r (MucD 3r)CGTTCAATGCTATGCAGGTTC (SEQ ID NO: 381) 2892f (MucD 15F)GTGCTTCACACTACACGCTG (SEQ ID NO: 382) 2959f (MucD 6f)CAGCCAGAAATTTGCCATC (SEQ ID NO: 383) 3218r (MucD 4r)TCCGGCTGTAGATGTCAATG (SEQ ID NO: 384) 3237f (MucD 7f)AGGCCACCAACACTATCAATG (SEQ ID NO: 385) 3348r (MucD 52R)TACCCTCGCTCAGCATTGAC (SEQ ID NO: 386) 3554f (MucD 8f)CTGGAAGATGCCAACATCG (SEQ ID NO: 387) 3684r (MucD 5r)TGAACCCTAGTCCGCTTGTC (SEQ ID NO: 388) 3848f (MucD 18F)CTGCAGAACCTGCTGACTTG (SEQ ID NO: 389) 3973f (MucD 19F)CCAGAGTGATGAAGAAGGCTG (SEQ ID NO: 390) 3981r (MucD 15R)TCACTCTGGTCAGCACACTC (SEQ ID NO: 391) 4079f (MucD 16F)CAGGATCGCTCTGATGAAGC (SEQ ID NO: 392) 4105r (MucD 53R)GCAGTTAGCTTCATCAGAGCG (SEQ ID NO: 393) 4234f (MucD 9f)ACCCTCTGATGACATCCCAG (SEQ ID NO: 394) 4270r (MucD 16R)AATGGCACTGCTGTGGGC (SEQ ID NO: 395) 4497r (MucD 6r) AGGCTCATGGAGCTCATCAC(SEQ ID NO: 396) 4589r (MucD 54R) ATAGTGTGGCCTTTGTGCTG (SEQ ID NO: 397)4703f (MucD 17F) GTCATTCGAGGTATGGCACC (SEQ ID NO: 398) 4799r (MucD 17R)GGTAGTATTTGCTGCTCTTCC (SEQ ID NO: 399) 5114r (MucD 1r) GC xba IAAAGTTTCCCAGCCCTGCC (SEQ ID NO: 400) Soluble/adeno primers 3554f (Mso1F)CTGGAAGATGCCAACATCG (SEQ ID NO: 401) 4264r (MHisR)GCTCTAGACTAGTGATGGTGATGGTGATGACTGCTGTGGGCTGGGATGTCATC AGAGGGTGG (SEQ IDNO: 402)

TABLE 10 Summary of Serum Chemistry Comparison of LRP3 treatment vscontrol Mouse Treatment p-value Variable Type (% diff ± SE) (Treatment)triglycerides WT + KO −30 ± 14   0.025 alkaline WT + KO −49 ± 15   0.001phosphatase# total KO only −28 ± 15   0.073 cholesterol total WT only 30± 13 0.080 cholesterol AST# WT + KO  8 ± 66 0.912 ALT# WT + KO −34 ±51   0.431 BUN WT + KO −19 ± 15   0.195 #statistically significantlyhigher baseline values for controls.

TABLE 11 Summary for Blood Chemistry Variables Pooled over Knockout andWild-Type Mice Treat Animal baseline post-treat % change p-valueVariable Group Type n (mean ± % CV) (mean ± % CV)t change (95% CI) (%chg) trigly (mg/dL) Control POOLED 10 86 ± 13% 186 ± 35% 100 115% (61,189) <0.001 trigly (mg/dL) LDL POOLED 9 92 ± 31% 81 ± 55% −12 −13% (−35,17) 0.321 trigly (mg/dL) LRP3 POOLED 8 99 ± 24% 128 ± 36% 29 30% (−10,86) 0.133 alkphos (U/L) Control POOLED 10 190 ± 19% 374 ± 30% 184 97%(68, 130) <0.001 alkphos (U/L) LDL POOLED 9 162 ± 12% 193 ± 29% 31 19%(−1, 43) 0.061 alkphos (U/L) LRP3 POOLED 8 154 ± 13% 146 ± 35% −8 −5%(−24, 19) 0.604 totchol (mg/dL) Control POOLED 10 116 ± 69% 176 ± 86% 6051% (21, 89) 0.002 totchol (mg/dL) LDL POOLED 9 124 ± 58% 87 ± 68% −37−30% (−41, −17) 0.001 totchol (mg/dL) LRP3 POOLED 8 127 ± 62% 166 ± 57%39 30% (9, 56) 0.009 AST (U/L) Control POOLED 9 41 ± 22% 821 ± 69% 7801894% (1142, 3101) <0.001 AST (U/L) LDL POOLED 8 41 ± 25% 362 ± 61% 320772% (369, 1520) <0.001 AST (U/L) LRP3 POOLED 8 33 ± 21% 989 ± 129% 9552888% (953, 8380) <0.001 ALT (U/L) Control POOLED 10 33 ± 15% 624 ± 59%591 1798% (1203, 2665) <0.001 ALT (U/L) LDL POOLED 8 32 ± 36% 331 ± 42%299 938% (447, 1872) <0.001 ALT (U/L) LRP3 POOLED 8 25 ± 35% 1020 ± 157%994 3944% (861, 16921) <0.001 BUN (U/L) Control POOLED 8 29 ± 12% 23 ±11% −5 −19% (−29, −7) 0.008 BUN (U/L) LDL POOLED 9 28 ± 19% 25 ± 14% −3−12% (−22, 1) 0.062 BUN (U/L) LRP3 POOLED 8 28 ± 12% 19 ± 41% −9 −31%(−53, 2) 0.058 Note means given are geometric means. p-value is from a2-sided paired t-test.

TABLE 12 Regions of Sequence Similarity Between Human and Mouse LRP-3Location in Nucleotide Percent BLAST Exon Human Sequence Length IdentityScore Name Contig 31 20235-20271 37 86 140 24410-24432 23 86 8824464-24667 204 82 168, 223 6 24904-24995 52 82 179 25489-25596 108 61360 26027-26078 52 80 170 26192-26261 70 84 251 26385-26486 102 87 39328952-28993 42 85 156 41707-41903 197 90 823 42827-42898 66 81 22243468-43585 117 85 316 50188-50333 146 86 550 54455-54494 40 80 12854718-54750 33 87 129 59713-60123 411 87 1587 A 78536-78680 145 80 473 D87496-87548 53 88 211 87598-87717 120 84 429 90772-90819 48 85 17799457-99795 339 83 1182 E 103094-103281 188 83 661 F 116659-116954 29681 985 G 119754-120089 336 83 1167 H Contig 30 8920-9256 337 89 1026 K11238-11353 116 84 *418 L 18394-18648 255 80 825 M 20020-20224 205 84746 N 20926-21153 228 83 807 O 24955-25155 201 82 672 P 29126-19288 16374 *437 Q 33874-34033 160 85 *593 S 35205-35340 136 86 509 T 41911-4191155 80 *176 U 44629-44681 53 73 *249 V

455 5098 base pairs nucleic acid single linear 1 ATGGAGCCCG AGTGAGCGCGGCGCGGGCCC GTCCGGCCGC CGGACAACAT GGAGGCAGCG 60 CCGCCCGGGC CGCCGTGGCCGCTGCTGCTG CTGCTGCTGC TGCTGCTGGC GCTGTGCGGC 120 TGCCCGGCCC CCGCCGCGGCCTCGCCGCTC CTGCTATTTG CCAACCGCCG GGACGTACGG 180 CTGGTGGACG CCGGCGGAGTCAAGCTGGAG TCCACCATCG TGGTCAGCGG CCTGGAGGAT 240 GCGGCCGCAG TGGACTTCCAGTTTTCCAAG GGAGCCGTGT ACTGGACAGA CGTGAGCGAG 300 GAGGCCATCA AGCAGACCTACCTGAACCAG ACGGGGGCCG CCGTGCAGAA CGTGGTCATC 360 TCCGGCCTGG TCTCTCCCGACGGCCTCGCC TGCGACTGGG TGGGCAAGAA GCTGTACTGG 420 ACGGACTCAG AGACCAACCGCATCGAGGTG GCCAACCTCA ATGGCACATC CCGGAAGGTG 480 CTCTTCTGGC AGGACCTTGACCAGCCGAGG GCCATCGCCT TGGACCCCGC TCACGGGTAC 540 ATGTACTGGA CAGACTGGGGTGAGACGCCC CGGATTGAGC GGGCAGGGAT GGATGGCAGC 600 ACCCGGAAGA TCATTGTGGACTCGGACATT TACTGGCCCA ATGGACTGAC CATCGACCTG 660 GAGGAGCAGA AGCTCTACTGGGCTGACGCC AAGCTCAGCT TCATCCACCG TGCCAACCTG 720 GACGGCTCGT TCCGGCAGAAGGTGGTGGAG GGCAGCCTGA CGCACCCCTT CGCCCTGACG 780 CTCTCCGGGG ACACTCTGTACTGGACAGAC TGGCAGACCC GCTCCATCCA TGCCTGCAAC 840 AAGCGCACTG GGGGGAAGAGGAAGGAGATC CTGAGTGCCC TCTACTCACC CATGGACATC 900 CAGGTGCTGA GCCAGGAGCGGCAGCCTTTC TTCCACACTC GCTGTGAGGA GGACAATGGC 960 GGCTGCTCCC ACCTGTGCCTGCTGTCCCCA AGCGAGCCTT TCTACACATG CGCCTGCCCC 1020 ACGGGTGTGC AGCTGCAGGACAACGGCAGG ACGTGTAAGG CAGGAGCCGA GGAGGTGCTG 1080 CTGCTGGCCC GGCGGACGGACCTACGGAGG ATCTCGCTGG ACACGCCGGA CTTTACCGAC 1140 ATCGTGCTGC AGGTGGACGACATCCGGCAC GCCATTGCCA TCGACTACGA CCCGCTAGAG 1200 GGCTATGTCT ACTGGACAGATGACGAGGTG CGGGCCATCC GCAGGGCGTA CCTGGACGGG 1260 TCTGGGGCGC AGACGCTGGTCAACACCGAG ATCAACGACC CCGATGGCAT CGCGGTCGAC 1320 TGGGTGGCCC GAAACCTCTACTGGACCGAC ACGGGCACGG ACCGCATCGA GGTGACGCGC 1380 CTCAACGGCA CCTCCCGCAAGATCCTGGTG TCGGAGGACC TGGACGAGCC CCGAGCCATC 1440 GCACTGCACC CCGTGATGGGCCTCATGTAC TGGACAGACT GGGGAGAGAA CCCTAAAATC 1500 GAGTGTGCCA ACTTGGATGGGCAGGAGCGG CGTGTGCTGG TCAATGCCTC CCTCGGGTGG 1560 CCCAACGGCC TGGCCCTGGACCTGCAGGAG GGGAAGCTCT ACTGGGGAGA CGCCAAGACA 1620 GACAAGATCG AGGTGATCAATGTTGATGGG ACGAAGAGGC GGACCCTCCT GGAGGACAAG 1680 CTCCCGCACA TTTTCGGGTTCACGCTGCTG GGGGACTTCA TCTACTGGAC TGACTGGCAG 1740 CGCCGCAGCA TCGAGCGGGTGCACAAGGTC AAGGCCAGCC GGGACGTCAT CATTGACCAG 1800 CTGCCCGACC TGATGGGGCTCAAAGCTGTG AATGTGGCCA AGGTCGTCGG AACCAACCCG 1860 TGTGCGGACA GGAACGGGGGGTGCAGCCAC CTGTGCTTCT TCACACCCCA CGCAACCCGG 1920 TGTGGCTGCC CCATCGGCCTGGAGCTGCTG AGTGACATGA AGACCTGCAT CGTGCCTGAG 1980 GCCTTCTTGG TCTTCACCAGCAGAGCCGCC ATCCACAGGA TCTCCCTCGA GACCAATAAC 2040 AACGACGTGC CATCCCGCTCACGGGCGTCA AGGAGGCCTC AGCCCTGGAC TTTGATGTGT 2100 CCAACAACCA CATCTACTGGACAGACGTCA GCCTGAAGAC CATCAGCCGC GCCTTCATGA 2160 ACGGGAGCTC GGTGGAGCACGTGGTGGAGT TTGGCCTTGA CTACCCCGAG GGCATGGCCG 2220 TTGACTGGAT GGGCAAGAACCTCTACTGGG CCGACACTGG GACCAACAGA ATCGAAGTGG 2280 CGCGGCTGGA CGGGCAGTTCCGGCAAGTCC TCGTGTGGAG GGACTTGGAC AACCCGAGGT 2340 CGCTGGCCCT GGATCCCACCAAGGGCTACA TCTACTGGAC CGAGTGGGGC GGCAAGCCGA 2400 GGATCGTGCG GGCCTTCATGGACGGGACCA ACTGCATGAC GCTGGTGGAC AAGGTGGGCC 2460 GGGCCAACGA CCTCACCATTGACTACGCTG ACCAGCGCCT CTACTGGACC GACCTGGACA 2520 CCAACATGAT CGAGTCGTCCAACATGCTGG GTCAGGAGCG GGTCGTGATT GCCGACGATC 2580 TCCCGCACCC GTTCGGTCTGACGCAGTACA GCGATTATAT CTACTGGACA GACTGGAATC 2640 TGCACAGCAT TGAGCGGGCCGACAAGACTA GCGGCCGGAA CCGCACCCTC ATCCAGGGCC 2700 ACCTGGACTT CGTGATGGACATCCTGGTGT TCCACTCCTC CCGCCAGGAT GGCCTCAATG 2760 ACTGTATGCA CAACAACGGGCAGTGTGGGC AGCTGTGCCT TGCCATCCCC GGCGGCCACC 2820 GCTGCGGCTG CGCCTCACACTACACCCTGG ACCCCAGCAG CCGCAACTGC AGCCCGCCCA 2880 CCACCTTCTT GCTGTTCAGCCAGAAATCTG CCATCAGTCG GATGATCCCG GACGACCAGC 2940 ACAGCCCGGA TCTCATCCTGCCCCTGCATG GACTGAGGAA CGTCAAAGCC ATCGACTATG 3000 ACCCACTGGA CAAGTTCATCTACTGGGTGG ATGGGCGCCA GAACATCAAG CGAGCCAAGG 3060 ACGACGGGAC CCAGCCCTTTGTTTTGACCT CTCTGAGCCA AGGCCAAAAC CCAGACAGGC 3120 AGCCCCACGA CCTCAGCATCGACATCTACA GCCGGACACT GTTCTGGACG TGCGAGGCCA 3180 CCAATACCAT CAACGTCCACAGGCTGAGCG GGGAAGCCAT GGGGGTGGTG CTGCGTGGGG 3240 ACCGCGACAA GCCCAGGGCCATCGTCGTCA ACGCGGAGCG AGGGTACCTG TACTTCACCA 3300 ACATGCAGGA CCGGGCAGCCAAGATCGAAC GCGCAGCCCT GGACGGCACC GAGCGCGAGG 3360 TCCTCTTCAC CACCGGCCTCATCCGCCCTG TGGCCCTGGT GGTAGACAAC ACACTGGGCA 3420 AGCTGTTCTG GGTGGACGCGGACCTGAAGC GCATTGAGAG CTGTGACCTG TCAGGGGCCA 3480 ACCGCCTGAC CCTGGAGGACGCCAACATCG TGCAGCCTCT GGGCCTGACC ATCCTTGGCA 3540 AGCATCTCTA CTGGATCGACCGCCAGCAGC AGATGATCGA GCGTGTGGAG AAGACCACCG 3600 GGGACAAGCG GACTCGCATCCAGGGCCGTG TCGCCCACCT CACTGGCATC CATGCAGTGG 3660 AGGAAGTCAG CCTGGAGGAGTTCTCAGCCC ACCCATGTGC CCGTGACAAT GGTGGCTGCT 3720 CCCACATCTG TATTGCCAAGGGTGATGGGA CACCACGGTG CTCATGCCCA GTCCACCTCG 3780 TGCTCCTGCA GAACCTGCTGACCTGTGGAG AGCCGCCCAC CTGCTCCCCG GACCAGTTTG 3840 CATGTGCCAC AGGGGAGATCGACTGTATCC CCGGGGCCTG GCGCTGTGAC GGCTTTCCCG 3900 AGTGCGATGA CCAGAGCGACGAGGAGGGCT GCCCCGTGTG CTCCGCCGCC CAGTTCCCCT 3960 GCGCGCGGGG TCAGTGTGTGGACCTGCGCC TGCGCTGCGA CGGCGAGGCA GACTGTCAGG 4020 ACCGCTCAGA CGAGGCGGACTGTGACGCCA TCTGCCTGCC CAACCAGTTC CGGTGTGCGA 4080 GCGGCCAGTG TGTCCTATCAAACAGCAGTG CGACTCCTTC CCCGACTGTA TCGACGGCTC 4140 CGACGAGCTC ATGTGTGAAATCACCAAGCC GCCCTCAGAC GACAGCCCGG CCCACAGCAG 4200 TGCCATCGGG CCCGTCATTGGCATCATCCT CTCTCTCTTC GTCATGGGTG GTGTCTATTT 4260 TGTGTGCCAG CGCGTGGTGTGCCAGCGCTA TGCGGGGGCC AACGGGCCCT TCCCGCACGA 4320 GTATGTCAGC GGGACCCCGCACGTGCCCCT CAATTTCATA GCCCCGGGCG GTTCCCAGCA 4380 TGGCCCCTTC ACAGGCATCGCATGCGGAAA GTCCATGATG AGCTCCGTGA GCCTGATGGG 4440 GGGCCGGGGC GGGGTGCCCCTCTACGACCG GAACCACGTC ACAGGGGCCT CGTCCAGCAG 4500 CTCGTCCAGC ACGAAGGCCACGCTGTACCC GCCGATCCTG AACCCGCCGC CCTCCCCGGC 4560 CACGGACCCC TCCCTGTACAACATGGACAT GTTCTACTCT TCAAACATTC CGGCCACTGT 4620 GAGACCGTAC AGGCCCTACATCATTCGAGG AATGGCGCCC CCGACGACGC CCTGCAGCAC 4680 CGACGTGTGT GACAGCGACTACAGCGCCAG CCGCTGGAAG GCCAGCAAGT ACTACCTGGA 4740 TTTGAACTCG GACTCAGACCCCTATCCACC CCCACCCACG CCCCACAGCC AGTACCTGTC 4800 GGCGGAGGAC AGCTGCCCGCCCTCGCCCGC CACCGAGAGG AGCTACTTCC ATCTCTTCCC 4860 GCCCCCTCCG TCCCCCTGCACGGACTCATC CTGACCTCGG CCGGGCCACT CTGGCTTCTC 4920 TGTGCCCCTG TAAATAGTTTTAAATATGAA CAAAGAAAAA AATATATTTT ATGATTTAAA 4980 AAATAAATAT AATTGGGATTTTAAAAACAT GAGAAATGTG AACTGTGATG GGGTGGGCAG 5040 GGCTGGGAGA ACTTTGTACAGTGGAACAAA TATTTATAAA CTTAATTTTG TAAAACAG 5098 4843 base pairs nucleicacid single linear 2 ATGGAGGCAG CGCCGCCCGG GCCGCCGTGG CCGCTGCTGCTGCTGCTGCT GCTGCTGCTG 60 GCGCTGTGCG GCTGCCCGGC CCCCGCCGCG GCCTCGCCGCTCCTGCTATT TGCCAACCGC 120 CGGGACGTAC GGCTGGTGGA CGCCGGCGGA GTCAAGCTGGAGTCCACCAT CGTGGTCAGC 180 GGCCTGGAGG ATGCGGCCGC AGTGGACTTC CAGTTTTCCAAGGGAGCCGT GTACTGGACA 240 GACGTGAGCG AGGAGGCCAT CAAGCAGACC TACCTGAACCAGACGGGGGC CGCCGTGCAG 300 AACGTGGTCA TCTCCGGCCT GGTCTCTCCC GACGGCCTCGCCTGCGACTG GGTGGGCAAG 360 AAGCTGTACT GGACGGACTC AGAGACCAAC CGCATCGAGGTGGCCAACCT CAATGGCACA 420 TCCCGGAAGG TGCTCTTCTG GCAGGACCTT GACCAGCCGAGGGCCATCGC CTTGGACCCC 480 GCTCACGGGT ACATGTACTG GACAGACTGG GGTGAGACGCCCCGGATTGA GCGGGCAGGG 540 ATGGATGGCA GCACCCGGAA GATCATTGTG GACTCGGACATTTACTGGCC CAATGGACTG 600 ACCATCGACC TGGAGGAGCA GAAGCTCTAC TGGGCTGACGCCAAGCTCAG CTTCATCCAC 660 CGTGCCAACC TGGACGGCTC GTTCCGGCAG AAGGTGGTGGAGGGCAGCCT GACGCACCCC 720 TTCGCCCTGA CGCTCTCCGG GGACACTCTG TACTGGACAGACTGGCAGAC CCGCTCCATC 780 CATGCCTGCA ACAAGCGCAC TGGGGGGAAG AGGAAGGAGATCCTGAGTGC CCTCTACTCA 840 CCCATGGACA TCCAGGTGCT GAGCCAGGAG CGGCAGCCTTTCTTCCACAC TCGCTGTGAG 900 GAGGACAATG GCGGCTGCTC CCACCTGTGC CTGCTGTCCCCAAGCGAGCC TTTCTACACA 960 TGCGCCTGCC CCACGGGTGT GCAGCTGCAG GACAACGGCAGGACGTGTAA GGCAGGAGCC 1020 GAGGAGGTGC TGCTGCTGGC CCGGCGGACG GACCTACGGAGGATCTCGCT GGACACGCCG 1080 GACTTTACCG ACATCGTGCT GCAGGTGGAC GACATCCGGCACGCCATTGC CATCGACTAC 1140 GACCCGCTAG AGGGCTATGT CTACTGGACA GATGACGAGGTGCGGGCCAT CCGCAGGGCG 1200 TACCTGGACG GGTCTGGGGC GCAGACGCTG GTCAACACCGAGATCAACGA CCCCGATGGC 1260 ATCGCGGTCG ACTGGGTGGC CCGAAACCTC TACTGGACCGACACGGGCAC GGACCGCATC 1320 GAGGTGACGC GCCTCAACGG CACCTCCCGC AAGATCCTGGTGTCGGAGGA CCTGGACGAG 1380 CCCCGAGCCA TCGCACTGCA CCCCGTGATG GGCCTCATGTACTGGACAGA CTGGGGAGAG 1440 AACCCTAAAA TCGAGTGTGC CAACTTGGAT GGGCAGGAGCGGCGTGTGCT GGTCAATGCC 1500 TCCCTCGGGT GGCCCAACGG CCTGGCCCTG GACCTGCAGGAGGGGAAGCT CTACTGGGGA 1560 GACGCCAAGA CAGACAAGAT CGAGGTGATC AATGTTGATGGGACGAAGAG GCGGACCCTC 1620 CTGGAGGACA AGCTCCCGCA CATTTTCGGG TTCACGCTGCTGGGGGACTT CATCTACTGG 1680 ACTGACTGGC AGCGCCGCAG CATCGAGCGG GTGCACAAGGTCAAGGCCAG CCGGGACGTC 1740 ATCATTGACC AGCTGCCCGA CCTGATGGGG CTCAAAGCTGTGAATGTGGC CAAGGTCGTC 1800 GGAACCAACC CGTGTGCGGA CAGGAACGGG GGGTGCAGCCACCTGTGCTT CTTCACACCC 1860 CACGCAACCC GGTGTGGCTG CCCCATCGGC CTGGAGCTGCTGAGTGACAT GAAGACCTGC 1920 ATCGTGCCTG AGGCCTTCTT GGTCTTCACC AGCAGAGCCGCCATCCACAG GATCTCCCTC 1980 GAGACCAATA ACAACGACGT GGCCATCCCG CTCACGGGCGTCAAGGAGGC CTCAGCCCTG 2040 GACTTTGAGT GTCCAACAAC CACATCTACT GGACAGACGTCAGCCTGAAG ACCATCAGCC 2100 GCGCCTTCAT GAACGGGAGC TCGGTGGAGC ACGTGGTGGAGTTTGGCCTT GACTACCCCG 2160 AGGGCATGGC CGTTGACTGG ATGGGCAAGA ACCTCTACTGGGCCGACACT GGGACCAACA 2220 GAATCGAAGT GGCGCGGCTG GACGGGCAGT TCCGGCAAGTCCTCGTGTGG AGGGACTTGG 2280 ACAACCCGAG GTCGCTGGCC CTGGATCCCA CCAAGGGCTACATCTACTGG ACCGAGTGGG 2340 GCGGCAAGCC GAGGATCGTG CGGGCCTTCA TGGACGGGACCAACTGCATG ACGCTGGTGG 2400 ACAAGGTGGG CCGGGCCAAC GACCTCACCA TTGACTACGCTGACCAGCGC CTCTACTGGA 2460 CCGACCTGGA CACCAACATG ATCGAGTCGT CCAACATGCTGGGTCAGGAG CGGGTCGTGA 2520 TTGCCGACGA TCTCCCGCAC CCGTTCGGTC TGACGCAGTACAGCGATTAT ATCTACTGGA 2580 CAGACTGGAA TCTGCACAGC ATTGAGCGGG CCGACAAGACTAGCGGCCGG AACCGCACCC 2640 TCATCCAGGG CCACCTGGAC TTCGTGATGG ACATCCTGGTGTTCCACTCC TCCCGCCAGG 2700 ATGGCCTCAA TGACTGTATG CACAACAACG GGCAGTGTGGGCAGCTGTGC CTTGCCATCC 2760 CCGGCGGCCA CCGCTGCGGC TGCGCCTCAC ACTACACCCTGGACCCCAGC AGCCGCAACT 2820 GCAGCCCGCC CACCACCTTC TTGCTGTTCA GCCAGAAATCTGCCATCAGT CGGATGATCC 2880 CGGACGACCA GCACAGCCCG GATCTCATCC TGCCCCTGCATGGACTGAGG AACGTCAAAG 2940 CCATCGACTA TGACCCACTG GACAAGTTCA TCTACTGGGTGGATGGGCGC CAGAACATCA 3000 AGCGAGCCAA GGACGACGGG ACCCAGCCCT TTGTTTTGACCTCTCTGAGC CAAGGCCAAA 3060 ACCCAGACAG GCAGCCCCAC GACCTCAGCA TCGACATCTACAGCCGGACA CTGTTCTGGA 3120 CGTGCGAGGC CACCAATACC ATCAACGTCC ACAGGCTGAGCGGGGAAGCC ATGGGGGTGG 3180 TGCTGCGTGG GGACCGCGAC AAGCCCAGGG CCATCGTCGTCAACGCGGAG CGAGGGTACC 3240 TGTACTTCAC CAACATGCAG GACCGGGCAG CCAAGATCGAACGCGCAGCC CTGGACGGCA 3300 CCGAGCGCGA GGTCCTCTTC ACCACCGGCC TCATCCGCCCTGTGGCCCTG GTGGTAGACA 3360 ACACACTGGG CAAGCTGTTC TGGGTGGACG CGGACCTGAAGCGCATTGAG AGCTGTGACC 3420 TGTCAGGGGC CAACCGCCTG ACCCTGGAGG ACGCCAACATCGTGCAGCCT CTGGGCCTGA 3480 CCATCCTTGG CAAGCATCTC TACTGGATCG ACCGCCAGCAGCAGATGATC GAGCGTGTGG 3540 AGAAGACCAC CGGGGACAAG CGGACTCGCA TCCAGGGCCGTGTCGCCCAC CTCACTGGCA 3600 TCCATGCAGT GGAGGAAGTC AGCCTGGAGG AGTTCTCAGCCCACCCATGT GCCCGTGACA 3660 ATGGTGGCTG CTCCCACATC TGTATTGCCA AGGGTGATGGGACACCACGG TGCTCATGCC 3720 CAGTCCACCT CGTGCTCCTG CAGAACCTGC TGACCTGTGGAGAGCCGCCC ACCTGCTCCC 3780 CGGACCAGTT TGCATGTGCC ACAGGGGAGA TCGACTGTATCCCCGGGGCC TGGCGCTGTG 3840 ACGGCTTTCC CGAGTGCGAT GACCAGAGCG ACGAGGAGGGCTGCCCCGTG TGCTCCGCCG 3900 CCCAGTTCCC CTGCGCGCGG GGTCAGTGTG TGGACCTGCGCCTGCGCTGC GACGGCGAGG 3960 CAGACTGTCA GGACCGCTCA GACGAGGCGG ACTGTGACGCCATCTGCCTG CCCAACCAGT 4020 TCCGGTGTGC GAGCGGCCAG TGTGTCCTCA TCAAACAGCAGTGCGACTCC TTCCCCGACT 4080 GTATCGACGG CTCCGAGAGC TCATGTGTGA AATCACCAAGCCGCCCTCAG ACGACAGCCC 4140 GGCCCACAGC AGTGCCATCG GGCCCGTCAT TGGCATCATCCTCTCTCTCT TCGTCATGGG 4200 TGGTGTCTAT TTTGTGTGCC AGCGCGTGGT GTGCCAGCGCTATGCGGGGG CCAACGGGCC 4260 CTTCCCGCAC GAGTATGTCA GCGGGACCCC GCACGTGCCCCTCAATTTCA TAGCCCCGGG 4320 CGGTTCCCAG CATGGCCCCT TCACAGGCAT CGCATGCGGAAAGTCCATGA TGAGCTCCGT 4380 GAGCCTGATG GGGGGCCGGG GCGGGGTGCC CCTCTACGACCGGAACCACG TCACAGGGGC 4440 CTCGTCCAGC AGCTCGTCCA GCACGAAGGC CACGCTGTACCCGCCGATCC TGAACCCGCC 4500 GCCCTCCCCG GCCACGGACC CCTCCCTGTA CAACATGGACATGTTCTACT CTTCAAACAT 4560 TCCGGCCACT GTGAGACCGT ACAGGCCCTA CATCATTCGAGGAATGGCGC CCCCGACGAC 4620 GCCCTGCAGC ACCGACGTGT GTGACAGCGA CTACAGCGCCAGCCGCTGGA AGGCCAGCAA 4680 GTACTACCTG GATTTGAACT CGGACTCAGA CCCCTATCCACCCCCACCCA CGCCCCACAG 4740 CCAGTACCTG TCGGCGGAGG ACAGCTGCCC GCCCTCGCCCGCCACCGAGA GGAGCTACTT 4800 CCATCTCTTC CCGCCCCCTC CGTCCCCCTG CACGGACTCATCC 4843 1615 amino acids amino acid linear 3 Met Glu Ala Ala Pro ProGly Pro Pro Trp Pro Leu Leu Leu Leu Leu 1 5 10 15 Leu Leu Leu Leu AlaLeu Cys Gly Cys Pro Ala Pro Ala Ala Ala Ser 20 25 30 Pro Leu Leu Leu PheAla Asn Arg Arg Asp Val Arg Leu Val Asp Ala 35 40 45 Gly Gly Val Lys LeuGlu Ser Thr Ile Val Val Ser Gly Leu Glu Asp 50 55 60 Ala Ala Ala Val AspPhe Gln Phe Ser Lys Gly Ala Val Tyr Trp Thr 65 70 75 80 Asp Val Ser GluGlu Ala Ile Lys Gln Thr Tyr Leu Asn Gln Thr Gly 85 90 95 Ala Ala Val GlnAsn Val Val Ile Ser Gly Leu Val Ser Pro Asp Gly 100 105 110 Leu Ala CysAsp Trp Val Gly Lys Lys Leu Tyr Trp Thr Asp Ser Glu 115 120 125 Thr AsnArg Ile Glu Val Ala Asn Leu Asn Gly Thr Ser Arg Lys Val 130 135 140 LeuPhe Trp Gln Asp Leu Asp Gln Pro Arg Ala Ile Ala Leu Asp Pro 145 150 155160 Ala His Gly Tyr Met Tyr Trp Thr Asp Trp Gly Glu Thr Pro Arg Ile 165170 175 Glu Arg Ala Gly Met Asp Gly Ser Thr Arg Lys Ile Ile Val Asp Ser180 185 190 Asp Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp Leu Glu Glu GlnLys 195 200 205 Leu Tyr Trp Ala Asp Ala Lys Leu Ser Phe Ile His Arg AlaAsn Leu 210 215 220 Asp Gly Ser Phe Arg Gln Lys Val Val Glu Gly Ser LeuThr His Pro 225 230 235 240 Phe Ala Leu Thr Leu Ser Gly Asp Thr Leu TyrTrp Thr Asp Trp Gln 245 250 255 Thr Arg Ser Ile His Ala Cys Asn Lys ArgThr Gly Gly Lys Arg Lys 260 265 270 Glu Ile Leu Ser Ala Leu Tyr Ser ProMet Asp Ile Gln Val Leu Ser 275 280 285 Gln Glu Arg Gln Pro Phe Phe HisThr Arg Cys Glu Glu Asp Asn Gly 290 295 300 Gly Cys Ser His Leu Cys LeuLeu Ser Pro Ser Glu Pro Phe Tyr Thr 305 310 315 320 Cys Ala Cys Pro ThrGly Val Gln Leu Gln Asp Asn Gly Arg Thr Cys 325 330 335 Lys Ala Gly AlaGlu Glu Val Leu Leu Leu Ala Arg Arg Thr Asp Leu 340 345 350 Arg Arg IleSer Leu Asp Thr Pro Asp Phe Thr Asp Ile Val Leu Gln 355 360 365 Val AspAsp Ile Arg His Ala Ile Ala Ile Asp Tyr Asp Pro Leu Glu 370 375 380 GlyTyr Val Tyr Trp Thr Asp Asp Glu Val Arg Ala Ile Arg Arg Ala 385 390 395400 Tyr Leu Asp Gly Ser Gly Ala Gln Thr Leu Val Asn Thr Glu Ile Asn 405410 415 Asp Pro Asp Gly Ile Ala Val Asp Trp Val Ala Arg Asn Leu Tyr Trp420 425 430 Thr Asp Thr Gly Thr Asp Arg Ile Glu Val Thr Arg Leu Asn GlyThr 435 440 445 Ser Arg Lys Ile Leu Val Ser Glu Asp Leu Asp Glu Pro ArgAla Ile 450 455 460 Ala Leu His Pro Val Met Gly Leu Met Tyr Trp Thr AspTrp Gly Glu 465 470 475 480 Asn Pro Lys Ile Glu Cys Ala Asn Leu Asp GlyGln Glu Arg Arg Val 485 490 495 Leu Val Asn Ala Ser Leu Gly Trp Pro AsnGly Leu Ala Leu Asp Leu 500 505 510 Gln Glu Gly Lys Leu Tyr Trp Gly AspAla Lys Thr Asp Lys Ile Glu 515 520 525 Val Ile Asn Val Asp Gly Thr LysArg Arg Thr Leu Leu Glu Asp Lys 530 535 540 Leu Pro His Ile Phe Gly PheThr Leu Leu Gly Asp Phe Ile Tyr Trp 545 550 555 560 Thr Asp Trp Gln ArgArg Ser Ile Glu Arg Val His Lys Val Lys Ala 565 570 575 Ser Arg Asp ValIle Ile Asp Gln Leu Pro Asp Leu Met Gly Leu Lys 580 585 590 Ala Val AsnVal Ala Lys Val Val Gly Thr Asn Pro Cys Ala Asp Arg 595 600 605 Asn GlyGly Cys Ser His Leu Cys Phe Phe Thr Pro His Ala Thr Arg 610 615 620 CysGly Cys Pro Ile Gly Leu Glu Leu Leu Ser Asp Met Lys Thr Cys 625 630 635640 Ile Val Pro Glu Ala Phe Leu Val Phe Thr Ser Arg Ala Ala Ile His 645650 655 Arg Ile Ser Leu Glu Thr Asn Asn Asn Asp Val Ala Ile Pro Leu Thr660 665 670 Gly Val Lys Glu Ala Ser Ala Leu Asp Phe Asp Val Ser Asn AsnHis 675 680 685 Ile Tyr Trp Thr Asp Val Ser Leu Lys Thr Ile Ser Arg AlaPhe Met 690 695 700 Asn Gly Ser Ser Val Glu His Val Val Glu Phe Gly LeuAsp Tyr Pro 705 710 715 720 Glu Gly Met Ala Val Asp Trp Met Gly Lys AsnLeu Tyr Trp Ala Asp 725 730 735 Thr Gly Thr Asn Arg Ile Glu Val Ala ArgLeu Asp Gly Gln Phe Arg 740 745 750 Gln Val Leu Val Trp Arg Asp Leu AspAsn Pro Arg Ser Leu Ala Leu 755 760 765 Asp Pro Thr Lys Gly Tyr Ile TyrTrp Thr Glu Trp Gly Gly Lys Pro 770 775 780 Arg Ile Val Arg Ala Phe MetAsp Gly Thr Asn Cys Met Thr Leu Val 785 790 795 800 Asp Lys Val Gly ArgAla Asn Asp Leu Thr Ile Asp Tyr Ala Asp Gln 805 810 815 Arg Leu Tyr TrpThr Asp Leu Asp Thr Asn Met Ile Glu Ser Ser Asn 820 825 830 Met Leu GlyGln Glu Arg Val Val Ile Ala Asp Asp Leu Pro His Pro 835 840 845 Phe GlyLeu Thr Gln Tyr Ser Asp Tyr Ile Tyr Trp Thr Asp Trp Asn 850 855 860 LeuHis Ser Ile Glu Arg Ala Asp Lys Thr Ser Gly Arg Asn Arg Thr 865 870 875880 Leu Ile Gln Gly His Leu Asp Phe Val Met Asp Ile Leu Val Phe His 885890 895 Ser Ser Arg Gln Asp Gly Leu Asn Asp Cys Met His Asn Asn Gly Gln900 905 910 Cys Gly Gln Leu Cys Leu Ala Ile Pro Gly Gly His Arg Cys GlyCys 915 920 925 Ala Ser His Tyr Thr Leu Asp Pro Ser Ser Arg Asn Cys SerPro Pro 930 935 940 Thr Thr Phe Leu Leu Phe Ser Gln Lys Ser Ala Ile SerArg Met Ile 945 950 955 960 Pro Asp Asp Gln His Ser Pro Asp Leu Ile LeuPro Leu His Gly Leu 965 970 975 Arg Asn Val Lys Ala Ile Asp Tyr Asp ProLeu Asp Lys Phe Ile Tyr 980 985 990 Trp Val Asp Gly Arg Gln Asn Ile LysArg Ala Lys Asp Asp Gly Thr 995 1000 1005 Gln Pro Phe Val Leu Thr SerLeu Ser Gln Gly Gln Asn Pro Asp Arg 1010 1015 1020 Gln Pro His Asp LeuSer Ile Asp Ile Tyr Ser Arg Thr Leu Phe Trp 1025 1030 1035 1040 Thr CysGlu Ala Thr Asn Thr Ile Asn Val His Arg Leu Ser Gly Glu 1045 1050 1055Ala Met Gly Val Val Leu Arg Gly Asp Arg Asp Lys Pro Arg Ala Ile 10601065 1070 Val Val Asn Ala Glu Arg Gly Tyr Leu Tyr Phe Thr Asn Met GlnAsp 1075 1080 1085 Arg Ala Ala Lys Ile Glu Arg Ala Ala Leu Asp Gly ThrGlu Arg Glu 1090 1095 1100 Val Leu Phe Thr Thr Gly Leu Ile Arg Pro ValAla Leu Val Val Asp 1105 1110 1115 1120 Asn Thr Leu Gly Lys Leu Phe TrpVal Asp Ala Asp Leu Lys Arg Ile 1125 1130 1135 Glu Ser Cys Asp Leu SerGly Ala Asn Arg Leu Thr Leu Glu Asp Ala 1140 1145 1150 Asn Ile Val GlnPro Leu Gly Leu Thr Ile Leu Gly Lys His Leu Tyr 1155 1160 1165 Trp IleAsp Arg Gln Gln Gln Met Ile Glu Arg Val Glu Lys Thr Thr 1170 1175 1180Gly Asp Lys Arg Thr Arg Ile Gln Gly Arg Val Ala His Leu Thr Gly 11851190 1195 1200 Ile His Ala Val Glu Glu Val Ser Leu Glu Glu Phe Ser AlaHis Pro 1205 1210 1215 Cys Ala Arg Asp Asn Gly Gly Cys Ser His Ile CysIle Ala Lys Gly 1220 1225 1230 Asp Gly Thr Pro Arg Cys Ser Cys Pro ValHis Leu Val Leu Leu Gln 1235 1240 1245 Asn Leu Leu Thr Cys Gly Glu ProPro Thr Cys Ser Pro Asp Gln Phe 1250 1255 1260 Ala Cys Ala Thr Gly GluIle Asp Cys Ile Pro Gly Ala Trp Arg Cys 1265 1270 1275 1280 Asp Gly PhePro Glu Cys Asp Asp Gln Ser Asp Glu Glu Gly Cys Pro 1285 1290 1295 ValCys Ser Ala Ala Gln Phe Pro Cys Ala Arg Gly Gln Cys Val Asp 1300 13051310 Leu Arg Leu Arg Cys Asp Gly Glu Ala Asp Cys Gln Asp Arg Ser Asp1315 1320 1325 Glu Ala Asp Cys Asp Ala Ile Cys Leu Pro Asn Gln Phe ArgCys Ala 1330 1335 1340 Ser Gly Gln Cys Val Leu Ile Lys Gln Gln Cys AspSer Phe Pro Asp 1345 1350 1355 1360 Cys Ile Asp Gly Ser Asp Glu Leu MetCys Glu Ile Thr Lys Pro Pro 1365 1370 1375 Ser Asp Asp Ser Pro Ala HisSer Ser Ala Ile Gly Pro Val Ile Gly 1380 1385 1390 Ile Ile Leu Ser LeuPhe Val Met Gly Gly Val Tyr Phe Val Cys Gln 1395 1400 1405 Arg Val ValCys Gln Arg Tyr Ala Gly Ala Asn Gly Pro Phe Pro His 1410 1415 1420 GluTyr Val Ser Gly Thr Pro His Val Pro Leu Asn Phe Ile Ala Pro 1425 14301435 1440 Gly Gly Ser Gln His Gly Pro Phe Thr Gly Ile Ala Cys Gly LysSer 1445 1450 1455 Met Met Ser Ser Val Ser Leu Met Gly Gly Arg Gly GlyVal Pro Leu 1460 1465 1470 Tyr Asp Arg Asn His Val Thr Gly Ala Ser SerSer Ser Ser Ser Ser 1475 1480 1485 Thr Lys Ala Thr Leu Tyr Pro Pro IleLeu Asn Pro Pro Pro Ser Pro 1490 1495 1500 Ala Thr Asp Pro Ser Leu TyrAsn Met Asp Met Phe Tyr Ser Ser Asn 1505 1510 1515 1520 Ile Pro Ala ThrVal Arg Pro Tyr Arg Pro Tyr Ile Ile Arg Gly Met 1525 1530 1535 Ala ProPro Thr Thr Pro Cys Ser Thr Asp Val Cys Asp Ser Asp Tyr 1540 1545 1550Ser Ala Ser Arg Trp Lys Ala Ser Lys Tyr Tyr Leu Asp Leu Asn Ser 15551560 1565 Asp Ser Asp Pro Tyr Pro Pro Pro Pro Thr Pro His Ser Gln TyrLeu 1570 1575 1580 Ser Ala Glu Asp Ser Cys Pro Pro Ser Pro Ala Thr GluArg Ser Tyr 1585 1590 1595 1600 Phe His Leu Phe Pro Pro Pro Pro Ser ProCys Thr Asp Ser Ser 1605 1610 1615 1591 amino acids amino acid linear 4Cys Pro Ala Pro Ala Ala Ala Ser Pro Leu Leu Leu Phe Ala Asn Arg 1 5 1015 Arg Asp Val Arg Leu Val Asp Ala Gly Gly Val Lys Leu Glu Ser Thr 20 2530 Ile Val Val Ser Gly Leu Glu Asp Ala Ala Ala Val Asp Phe Gln Phe 35 4045 Ser Lys Gly Ala Val Tyr Trp Thr Asp Val Ser Glu Glu Ala Ile Lys 50 5560 Gln Thr Tyr Leu Asn Gln Thr Gly Ala Ala Val Gln Asn Val Val Ile 65 7075 80 Ser Gly Leu Val Ser Pro Asp Gly Leu Ala Cys Asp Trp Val Gly Lys 8590 95 Lys Leu Tyr Trp Thr Asp Ser Glu Thr Asn Arg Ile Glu Val Ala Asn100 105 110 Leu Asn Gly Thr Ser Arg Lys Val Leu Phe Trp Gln Asp Leu AspGln 115 120 125 Pro Arg Ala Ile Ala Leu Asp Pro Ala His Gly Tyr Met TyrTrp Thr 130 135 140 Asp Trp Gly Glu Thr Pro Arg Ile Glu Arg Ala Gly MetAsp Gly Ser 145 150 155 160 Thr Arg Lys Ile Ile Val Asp Ser Asp Ile TyrTrp Pro Asn Gly Leu 165 170 175 Thr Ile Asp Leu Glu Glu Gln Lys Leu TyrTrp Ala Asp Ala Lys Leu 180 185 190 Ser Phe Ile His Arg Ala Asn Leu AspGly Ser Phe Arg Gln Lys Val 195 200 205 Val Glu Gly Ser Leu Thr His ProPhe Ala Leu Thr Leu Ser Gly Asp 210 215 220 Thr Leu Tyr Trp Thr Asp TrpGln Thr Arg Ser Ile His Ala Cys Asn 225 230 235 240 Lys Arg Thr Gly GlyLys Arg Lys Glu Ile Leu Ser Ala Leu Tyr Ser 245 250 255 Pro Met Asp IleGln Val Leu Ser Gln Glu Arg Gln Pro Phe Phe His 260 265 270 Thr Arg CysGlu Glu Asp Asn Gly Gly Cys Ser His Leu Cys Leu Leu 275 280 285 Ser ProSer Glu Pro Phe Tyr Thr Cys Ala Cys Pro Thr Gly Val Gln 290 295 300 LeuGln Asp Asn Gly Arg Thr Cys Lys Ala Gly Ala Glu Glu Val Leu 305 310 315320 Leu Leu Ala Arg Arg Thr Asp Leu Arg Arg Ile Ser Leu Asp Thr Pro 325330 335 Asp Phe Thr Asp Ile Val Leu Gln Val Asp Asp Ile Arg His Ala Ile340 345 350 Ala Ile Asp Tyr Asp Pro Leu Glu Gly Tyr Val Tyr Trp Thr AspAsp 355 360 365 Glu Val Arg Ala Ile Arg Arg Ala Tyr Leu Asp Gly Ser GlyAla Gln 370 375 380 Thr Leu Val Asn Thr Glu Ile Asn Asp Pro Asp Gly IleAla Val Asp 385 390 395 400 Trp Val Ala Arg Asn Leu Tyr Trp Thr Asp ThrGly Thr Asp Arg Ile 405 410 415 Glu Val Thr Arg Leu Asn Gly Thr Ser ArgLys Ile Leu Val Ser Glu 420 425 430 Asp Leu Asp Glu Pro Arg Ala Ile AlaLeu His Pro Val Met Gly Leu 435 440 445 Met Tyr Trp Thr Asp Trp Gly GluAsn Pro Lys Ile Glu Cys Ala Asn 450 455 460 Leu Asp Gly Gln Glu Arg ArgVal Leu Val Asn Ala Ser Leu Gly Trp 465 470 475 480 Pro Asn Gly Leu AlaLeu Asp Leu Gln Glu Gly Lys Leu Tyr Trp Gly 485 490 495 Asp Ala Lys ThrAsp Lys Ile Glu Val Ile Asn Val Asp Gly Thr Lys 500 505 510 Arg Arg ThrLeu Leu Glu Asp Lys Leu Pro His Ile Phe Gly Phe Thr 515 520 525 Leu LeuGly Asp Phe Ile Tyr Trp Thr Asp Trp Gln Arg Arg Ser Ile 530 535 540 GluArg Val His Lys Val Lys Ala Ser Arg Asp Val Ile Ile Asp Gln 545 550 555560 Leu Pro Asp Leu Met Gly Leu Lys Ala Val Asn Val Ala Lys Val Val 565570 575 Gly Thr Asn Pro Cys Ala Asp Arg Asn Gly Gly Cys Ser His Leu Cys580 585 590 Phe Phe Thr Pro His Ala Thr Arg Cys Gly Cys Pro Ile Gly LeuGlu 595 600 605 Leu Leu Ser Asp Met Lys Thr Cys Ile Val Pro Glu Ala PheLeu Val 610 615 620 Phe Thr Ser Arg Ala Ala Ile His Arg Ile Ser Leu GluThr Asn Asn 625 630 635 640 Asn Asp Val Ala Ile Pro Leu Thr Gly Val LysGlu Ala Ser Ala Leu 645 650 655 Asp Phe Asp Val Ser Asn Asn His Ile TyrTrp Thr Asp Val Ser Leu 660 665 670 Lys Thr Ile Ser Arg Ala Phe Met AsnGly Ser Ser Val Glu His Val 675 680 685 Val Glu Phe Gly Leu Asp Tyr ProGlu Gly Met Ala Val Asp Trp Met 690 695 700 Gly Lys Asn Leu Tyr Trp AlaAsp Thr Gly Thr Asn Arg Ile Glu Val 705 710 715 720 Ala Arg Leu Asp GlyGln Phe Arg Gln Val Leu Val Trp Arg Asp Leu 725 730 735 Asp Asn Pro ArgSer Leu Ala Leu Asp Pro Thr Lys Gly Tyr Ile Tyr 740 745 750 Trp Thr GluTrp Gly Gly Lys Pro Arg Ile Val Arg Ala Phe Met Asp 755 760 765 Gly ThrAsn Cys Met Thr Leu Val Asp Lys Val Gly Arg Ala Asn Asp 770 775 780 LeuThr Ile Asp Tyr Ala Asp Gln Arg Leu Tyr Trp Thr Asp Leu Asp 785 790 795800 Thr Asn Met Ile Glu Ser Ser Asn Met Leu Gly Gln Glu Arg Val Val 805810 815 Ile Ala Asp Asp Leu Pro His Pro Phe Gly Leu Thr Gln Tyr Ser Asp820 825 830 Tyr Ile Tyr Trp Thr Asp Trp Asn Leu His Ser Ile Glu Arg AlaAsp 835 840 845 Lys Thr Ser Gly Arg Asn Arg Thr Leu Ile Gln Gly His LeuAsp Phe 850 855 860 Val Met Asp Ile Leu Val Phe His Ser Ser Arg Gln AspGly Leu Asn 865 870 875 880 Asp Cys Met His Asn Asn Gly Gln Cys Gly GlnLeu Cys Leu Ala Ile 885 890 895 Pro Gly Gly His Arg Cys Gly Cys Ala SerHis Tyr Thr Leu Asp Pro 900 905 910 Ser Ser Arg Asn Cys Ser Pro Pro ThrThr Phe Leu Leu Phe Ser Gln 915 920 925 Lys Ser Ala Ile Ser Arg Met IlePro Asp Asp Gln His Ser Pro Asp 930 935 940 Leu Ile Leu Pro Leu His GlyLeu Arg Asn Val Lys Ala Ile Asp Tyr 945 950 955 960 Asp Pro Leu Asp LysPhe Ile Tyr Trp Val Asp Gly Arg Gln Asn Ile 965 970 975 Lys Arg Ala LysAsp Asp Gly Thr Gln Pro Phe Val Leu Thr Ser Leu 980 985 990 Ser Gln GlyGln Asn Pro Asp Arg Gln Pro His Asp Leu Ser Ile Asp 995 1000 1005 IleTyr Ser Arg Thr Leu Phe Trp Thr Cys Glu Ala Thr Asn Thr Ile 1010 10151020 Asn Val His Arg Leu Ser Gly Glu Ala Met Gly Val Val Leu Arg Gly1025 1030 1035 1040 Asp Arg Asp Lys Pro Arg Ala Ile Val Val Asn Ala GluArg Gly Tyr 1045 1050 1055 Leu Tyr Phe Thr Asn Met Gln Asp Arg Ala AlaLys Ile Glu Arg Ala 1060 1065 1070 Ala Leu Asp Gly Thr Glu Arg Glu ValLeu Phe Thr Thr Gly Leu Ile 1075 1080 1085 Arg Pro Val Ala Leu Val ValAsp Asn Thr Leu Gly Lys Leu Phe Trp 1090 1095 1100 Val Asp Ala Asp LeuLys Arg Ile Glu Ser Cys Asp Leu Ser Gly Ala 1105 1110 1115 1120 Asn ArgLeu Thr Leu Glu Asp Ala Asn Ile Val Gln Pro Leu Gly Leu 1125 1130 1135Thr Ile Leu Gly Lys His Leu Tyr Trp Ile Asp Arg Gln Gln Gln Met 11401145 1150 Ile Glu Arg Val Glu Lys Thr Thr Gly Asp Lys Arg Thr Arg IleGln 1155 1160 1165 Gly Arg Val Ala His Leu Thr Gly Ile His Ala Val GluGlu Val Ser 1170 1175 1180 Leu Glu Glu Phe Ser Ala His Pro Cys Ala ArgAsp Asn Gly Gly Cys 1185 1190 1195 1200 Ser His Ile Cys Ile Ala Lys GlyAsp Gly Thr Pro Arg Cys Ser Cys 1205 1210 1215 Pro Val His Leu Val LeuLeu Gln Asn Leu Leu Thr Cys Gly Glu Pro 1220 1225 1230 Pro Thr Cys SerPro Asp Gln Phe Ala Cys Ala Thr Gly Glu Ile Asp 1235 1240 1245 Cys IlePro Gly Ala Trp Arg Cys Asp Gly Phe Pro Glu Cys Asp Asp 1250 1255 1260Gln Ser Asp Glu Glu Gly Cys Pro Val Cys Ser Ala Ala Gln Phe Pro 12651270 1275 1280 Cys Ala Arg Gly Gln Cys Val Asp Leu Arg Leu Arg Cys AspGly Glu 1285 1290 1295 Ala Asp Cys Gln Asp Arg Ser Asp Glu Ala Asp CysAsp Ala Ile Cys 1300 1305 1310 Leu Pro Asn Gln Phe Arg Cys Ala Ser GlyGln Cys Val Leu Ile Lys 1315 1320 1325 Gln Gln Cys Asp Ser Phe Pro AspCys Ile Asp Gly Ser Asp Glu Leu 1330 1335 1340 Met Cys Glu Ile Thr LysPro Pro Ser Asp Asp Ser Pro Ala His Ser 1345 1350 1355 1360 Ser Ala IleGly Pro Val Ile Gly Ile Ile Leu Ser Leu Phe Val Met 1365 1370 1375 GlyGly Val Tyr Phe Val Cys Gln Arg Val Val Cys Gln Arg Tyr Ala 1380 13851390 Gly Ala Asn Gly Pro Phe Pro His Glu Tyr Val Ser Gly Thr Pro His1395 1400 1405 Val Pro Leu Asn Phe Ile Ala Pro Gly Gly Ser Gln His GlyPro Phe 1410 1415 1420 Thr Gly Ile Ala Cys Gly Lys Ser Met Met Ser SerVal Ser Leu Met 1425 1430 1435 1440 Gly Gly Arg Gly Gly Val Pro Leu TyrAsp Arg Asn His Val Thr Gly 1445 1450 1455 Ala Ser Ser Ser Ser Ser SerSer Thr Lys Ala Thr Leu Tyr Pro Pro 1460 1465 1470 Ile Leu Asn Pro ProPro Ser Pro Ala Thr Asp Pro Ser Leu Tyr Asn 1475 1480 1485 Met Asp MetPhe Tyr Ser Ser Asn Ile Pro Ala Thr Val Arg Pro Tyr 1490 1495 1500 ArgPro Tyr Ile Ile Arg Gly Met Ala Pro Pro Thr Thr Pro Cys Ser 1505 15101515 1520 Thr Asp Val Cys Asp Ser Asp Tyr Ser Ala Ser Arg Trp Lys AlaSer 1525 1530 1535 Lys Tyr Tyr Leu Asp Leu Asn Ser Asp Ser Asp Pro TyrPro Pro Pro 1540 1545 1550 Pro Thr Pro His Ser Gln Tyr Leu Ser Ala GluAsp Ser Cys Pro Pro 1555 1560 1565 Ser Pro Ala Thr Glu Arg Ser Tyr PheHis Leu Phe Pro Pro Pro Pro 1570 1575 1580 Ser Pro Cys Thr Asp Ser Ser1585 1590 432 base pairs nucleic acid single linear 5 ATGGAGCCCGAGTGAGCGCG GCGCGGGCCC GTCCGGCCGC CGGACAACAT GGAGGCAGCG 60 CCGCCCGGGCCGCCGTGGCC GCTGCTGCTG CTGCTGCTGC TGCTGCTGGC GCTGTGCGGC 120 TGCCCGGCCCCCGCCGCGGC CTCGCCGCTC CTGCTATTTG CCAACCGCCG GGACGTACGG 180 CTGGTGGACGCCGGCGGAGT CAAGCTGGAG TCCACCATCG TGGTCAGCGG CCTGGAGGAT 240 GCGGCCGCAGTGGACTTCCA GTTTTCCAAG GGAGCCGTGT ACTGGACAGA CGTGAGCGAG 300 GAGGCCATCAAGCAGACCTA CCTGAACCAG ACGGGGGCCG CCGTGCAGAA CGTGGTCATC 360 TCCGGCCTGGTCTCTCCCGA CGGCCTCGCC TGCGACTGGG TGGGCAAGAA GCTGTACTGG 420 ACGGACTCAG AG432 443 base pairs nucleic acid single linear 6 ACCGCCGCCG CGCGCGCCATGGAGCCCGAG TGAGCGCGCG GCGCTCCCGG CCGCCGGACG 60 ACATGGAAAC GGCGCCGACCCGGGCCCCTC CGCCGCCGCC GCCGCCGCTG CTGCTGCTGG 120 TGCTGTACTG CAGCTTGGTCCCCGCCGCGG CCTCACCGCT CCTGTTGTTT GCCAACCGCC 180 GGGATGTGCG GCTAGTGGATGCCGGCGGAG TGAAGCTGGA GTCCACCATT GTGGCCAGTG 240 GCCTGGAGGA TGCAGCTGCTGTAGACTTCC AGTTCTCCAA GGGTGCTGTG TACTGGACAG 300 ATGTGAGCGA GGAGGCCATCAAACAGACCT ACCTGAACCA GACTGGAGGT GCTGCACAGA 360 ACATTGTCAT CTCGGGCCTCGTGTCACCTG ATGGCCTGGC CTGTGACTGG GTTGGCAAGA 420 AGCTGTACTG GACGGACTCCGAG 443 550 amino acids amino acid linear 7 Met Glu Ala Ala Pro Pro GlyPro Pro Trp Pro Leu Leu Leu Leu Leu 1 5 10 15 Leu Leu Leu Leu Ala LeuCys Gly Cys Pro Ala Pro Ala Ala Ala Ser 20 25 30 Pro Leu Leu Leu Phe AlaAsn Arg Arg Asp Val Arg Leu Val Asp Ala 35 40 45 Gly Gly Val Lys Leu GluSer Thr Ile Val Val Ser Gly Leu Glu Asp 50 55 60 Ala Ala Ala Val Asp PheGln Phe Ser Lys Gly Ala Val Tyr Trp Thr 65 70 75 80 Asp Val Ser Glu GluAla Ile Lys Gln Thr Tyr Leu Asn Gln Thr Gly 85 90 95 Ala Ala Val Gln AsnVal Val Ile Ser Gly Leu Val Ser Pro Asp Gly 100 105 110 Leu Ala Cys AspTrp Val Gly Lys Lys Leu Tyr Trp Thr Asp Ser Glu 115 120 125 Thr Asn ArgIle Glu Val Ala Asn Leu Asn Gly Thr Ser Arg Lys Val 130 135 140 Leu PheTrp Gln Asp Leu Asp Gln Pro Arg Ala Ile Ala Leu Asp Pro 145 150 155 160Ala His Gly Tyr Met Tyr Trp Thr Asp Trp Gly Glu Thr Pro Arg Ile 165 170175 Glu Arg Ala Gly Met Asp Gly Ser Thr Arg Lys Ile Ile Val Asp Ser 180185 190 Asp Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp Leu Glu Glu Gln Lys195 200 205 Leu Tyr Trp Ala Asp Ala Lys Leu Ser Phe Ile His Arg Ala AsnLeu 210 215 220 Asp Gly Ser Phe Arg Gln Lys Val Val Glu Gly Ser Leu ThrHis Pro 225 230 235 240 Phe Ala Leu Thr Leu Ser Gly Asp Thr Leu Tyr TrpThr Asp Trp Gln 245 250 255 Thr Arg Ser Ile His Ala Cys Asn Lys Arg ThrGly Gly Lys Arg Lys 260 265 270 Glu Ile Leu Ser Ala Leu Tyr Ser Pro MetAsp Ile Gln Val Leu Ser 275 280 285 Gln Glu Arg Gln Pro Phe Phe His ThrArg Cys Glu Glu Asp Asn Gly 290 295 300 Gly Cys Ser His Leu Cys Leu LeuSer Pro Ser Glu Pro Phe Tyr Thr 305 310 315 320 Cys Ala Cys Pro Thr GlyVal Gln Leu Gln Asp Asn Gly Arg Thr Cys 325 330 335 Lys Ala Gly Ala GluGlu Val Leu Leu Leu Ala Arg Arg Thr Asp Leu 340 345 350 Arg Arg Ile SerLeu Asp Thr Pro Asp Phe Thr Asp Ile Val Leu Gln 355 360 365 Val Asp AspIle Arg His Ala Ile Ala Ile Asp Tyr Asp Pro Leu Glu 370 375 380 Gly TyrVal Tyr Trp Thr Asp Asp Glu Val Arg Ala Ile Arg Arg Ala 385 390 395 400Tyr Leu Asp Gly Ser Gly Ala Gln Thr Leu Val Asn Thr Glu Ile Asn 405 410415 Asp Pro Asp Gly Ile Ala Val Asp Trp Val Ala Arg Asn Leu Tyr Trp 420425 430 Thr Asp Thr Gly Thr Asp Arg Ile Glu Val Thr Arg Leu Asn Gly Thr435 440 445 Ser Arg Lys Ile Leu Val Ser Glu Asp Leu Asp Glu Pro Arg AlaIle 450 455 460 Ala Leu His Pro Val Met Gly Leu Met Tyr Trp Thr Asp TrpGly Glu 465 470 475 480 Asn Pro Lys Ile Glu Cys Ala Asn Leu Asp Gly GlnGlu Arg Arg Val 485 490 495 Leu Val Asn Ala Ser Leu Gly Trp Pro Asn GlyLeu Ala Leu Asp Leu 500 505 510 Gln Glu Gly Lys Leu Tyr Trp Gly Asp AlaLys Thr Asp Lys Ile Glu 515 520 525 Val Ile Asn Val Asp Gly Thr Lys ArgArg Thr Leu Leu Glu Asp Lys 530 535 540 Leu Pro His Ile Phe Gly 545 550533 amino acids amino acid linear 8 Met Glu Thr Ala Pro Thr Arg Ala ProPro Pro Pro Pro Pro Pro Leu 1 5 10 15 Leu Leu Leu Val Leu Tyr Cys SerLeu Val Pro Ala Ala Ala Ser Pro 20 25 30 Leu Leu Leu Phe Ala Asn Arg ArgAsp Val Arg Leu Val Asp Ala Gly 35 40 45 Gly Val Lys Leu Glu Ser Thr IleVal Ala Ser Gly Leu Glu Asp Ala 50 55 60 Ala Ala Val Asp Phe Gln Phe SerLys Gly Ala Val Tyr Trp Thr Asp 65 70 75 80 Val Ser Glu Glu Ala Ile LysGln Thr Tyr Leu Asn Gln Thr Gly Gly 85 90 95 Ala Ala Gln Asn Ile Val IleSer Gly Leu Val Ser Pro Asp Gly Leu 100 105 110 Ala Cys Asp Trp Val GlyLys Lys Leu Tyr Trp Thr Asp Ser Glu Thr 115 120 125 Asn Arg Ile Glu ValAla Asn Leu Asn Gly Thr Ser Arg Lys Val Leu 130 135 140 Phe Trp Gln AspLeu Asp Gln Pro Arg Ala Ile Ala Leu Asp Pro Ala 145 150 155 160 His GlyTyr Met Tyr Trp Thr Asp Trp Gly Glu Ala Pro Arg Ile Glu 165 170 175 ArgAla Gly Met Asp Gly Ser Thr Arg Lys Ile Ile Val Asp Ser Asp 180 185 190Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp Leu Glu Glu Gln Lys Leu 195 200205 Tyr Trp Ala Asp Ala Lys Leu Ser Phe Ile His Arg Ala Asn Leu Asp 210215 220 Gly Ser Phe Arg Gln Lys Val Val Glu Gly Ser Leu Thr His Pro Phe225 230 235 240 Ala Leu Thr Leu Ser Gly Asp Thr Leu Tyr Trp Thr Asp TrpGln Thr 245 250 255 Arg Ser Ile His Ala Cys Asn Lys Trp Thr Gly Glu GlnArg Lys Glu 260 265 270 Ile Leu Ser Ala Leu Tyr Ser Pro Met Asp Ile GlnVal Leu Ser Gln 275 280 285 Glu Arg Gln Pro Pro Phe His Thr Pro Cys GluGlu Asp Asn Gly Gly 290 295 300 Cys Ser His Leu Cys Leu Leu Ser Pro ArgGlu Pro Phe Tyr Ser Cys 305 310 315 320 Ala Cys Pro Thr Gly Val Gln LeuGln Asp Asn Gly Lys Thr Cys Lys 325 330 335 Thr Gly Ala Glu Glu Val LeuLeu Leu Ala Arg Arg Thr Asp Leu Arg 340 345 350 Arg Ile Ser Leu Asp ThrPro Asp Phe Thr Asp Ile Val Leu Gln Val 355 360 365 Gly Asp Ile Arg HisAla Ile Ala Ile Asp Tyr Asp Pro Leu Glu Gly 370 375 380 Tyr Val Tyr TrpThr Asp Asp Glu Val Arg Ala Ile Arg Arg Ala Tyr 385 390 395 400 Leu AspGly Ser Gly Ala Gln Thr Leu Val Asn Thr Glu Ile Asn Asp 405 410 415 ProAsp Gly Ile Ala Val Asp Trp Val Ala Arg Asn Leu Tyr Trp Thr 420 425 430Asp Thr Gly Thr Asp Arg Ile Glu Val Thr Arg Leu Asn Gly Thr Ser 435 440445 Arg Lys Ile Leu Val Ser Glu Asp Leu Asp Glu Pro Arg Ala Ile Val 450455 460 Leu His Pro Val Met Gly Leu Met Tyr Trp Thr Asp Trp Gly Glu Asn465 470 475 480 Pro Lys Ile Glu Cys Ala Asn Leu Asp Gly Arg Asp Arg HisVal Leu 485 490 495 Val Asn Thr Ser Leu Gly Trp Pro Asn Gly Leu Ala LeuAsp Leu Gln 500 505 510 Glu Gly Lys Leu Tyr Trp Gly Asp Ala Lys Thr AspLys Ile Glu Val 515 520 525 Ile Asn Ile Asp Gly 530 38 amino acids aminoacid linear 9 Cys Glu Glu Asp Asn Gly Gly Cys Ser His Leu Cys Leu LeuSer Pro 1 5 10 15 Ser Glu Pro Phe Tyr Thr Cys Ala Cys Pro Thr Gly ValGln Leu Gln 20 25 30 Asp Asn Gly Arg Thr Cys 35 37 amino acids aminoacid linear 10 Cys Lys Val Asn Asn Gly Gly Cys Ser Asn Leu Cys Leu LeuSer Pro 1 5 10 15 Gly Gly Gly His Lys Cys Ala Cys Pro Thr Asn Phe TyrLeu Gly Ser 20 25 30 Asp Gly Arg Thr Cys 35 41 amino acids amino acidlinear 11 Gly Thr Asn Pro Cys Ala Asp Arg Asn Gly Gly Cys Ser His LeuCys 1 5 10 15 Phe Phe Thr Pro His Ala Thr Arg Cys Gly Cys Pro Ile GlyLeu Glu 20 25 30 Leu Leu Ser Asp Met Lys Thr Cys Ile 35 40 41 aminoacids amino acid linear 12 Gly Thr Asn Lys Cys Arg Val Asn Asn Gly GlyCys Ser Ser Leu Cys 1 5 10 15 Leu Ala Thr Pro Gly Ser Arg Gln Cys AlaCys Ala Glu Asp Gln Val 20 25 30 Leu Asp Ala Asp Gly Val Thr Cys Leu 3540 40 amino acids amino acid linear 13 Gly Leu Asn Asp Cys Met His AsnAsn Gly Gln Cys Gly Gln Leu Cys 1 5 10 15 Leu Ala Ile Pro Gly Gly HisArg Cys Gly Cys Ala Ser His Tyr Thr 20 25 30 Leu Asp Pro Ser Ser Arg AsnCys 35 40 40 amino acids amino acid linear 14 Gly Thr Asn Lys Cys ArgVal Asn Asn Gly Gly Cys Ser Ser Leu Cys 1 5 10 15 Leu Ala Thr Pro GlySer Arg Gln Cys Ala Cys Ala Glu Asp Gln Val 20 25 30 Leu Asp Ala Asp GlyVal Thr Cys 35 40 39 amino acids amino acid linear 15 His Pro Cys AlaArg Asp Asn Gly Gly Cys Ser His Ile Cys Ile Ala 1 5 10 15 Lys Gly AspGly Thr Pro Arg Cys Ser Cys Pro Val His Leu Val Leu 20 25 30 Leu Gln AsnLeu Leu Thr Cys 35 39 amino acids amino acid linear 16 His Pro Cys LysVal Asn Asn Gly Gly Cys Ser Asn Leu Cys Leu Leu 1 5 10 15 Ser Pro GlyGly Gly His Lys Cys Ala Cys Pro Thr Asn Phe Tyr Leu 20 25 30 Gly Ser AspGly Arg Thr Cys 35 39 amino acids amino acid linear 17 Pro Thr Cys SerPro Asp Gln Phe Ala Cys Ala Thr Gly Glu Ile Asp 1 5 10 15 Cys Ile ProGly Ala Trp Arg Cys Asp Gly Phe Pro Glu Cys Asp Asp 20 25 30 Gln Ser AspGlu Glu Gly Cys 35 37 amino acids amino acid linear 18 Pro Arg Cys AspMet Asp Gln Phe Gln Cys Lys Ser Gly His Cys Ile 1 5 10 15 Pro Leu ArgTrp Arg Cys Asp Ala Asp Ala Asp Cys Met Asp Gly Ser 20 25 30 Asp Glu GluAla Cys 35 36 amino acids amino acid linear 19 Cys Ser Ala Ala Gln PhePro Cys Ala Arg Gly Gln Cys Val Asp Leu 1 5 10 15 Arg Leu Arg Cys AspGly Glu Ala Asp Cys Gln Asp Arg Ser Asp Glu 20 25 30 Ala Asp Cys Asp 3536 amino acids amino acid linear 20 Cys Arg Pro Gly Gln Phe Gln Cys SerThr Gly Ile Cys Thr Asn Pro 1 5 10 15 Ala Phe Ile Cys Asp Gly Asp AsnAsp Cys Gln Asp Asn Ser Asp Glu 20 25 30 Ala Asn Cys Asp 35 35 aminoacids amino acid linear 21 Cys Leu Pro Asn Gln Phe Arg Cys Ala Ser GlyGln Cys Val Leu Ile 1 5 10 15 Lys Gln Gln Cys Asp Ser Phe Pro Asp CysIle Asp Gly Ser Asp Glu 20 25 30 Leu Met Cys 35 35 amino acids aminoacid linear 22 Cys Asp Met Asp Gln Phe Gln Cys Lys Ser Gly His Cys IlePro Leu 1 5 10 15 Arg Trp Arg Cys Asp Ala Asp Ala Asp Cys Met Asp GlySer Asp Glu 20 25 30 Glu Ala Cys 35 5166 base pairs nucleic acid singlelinear 23 GAGAGGACAC CGCATTCTTC TTCTCCAGAG GATGCAGCAG CAAGGCGCCATCTTGAAACC 60 AGAGACCAAA CCAACCAGCA WTTTTGTCTT GAACTTCCCA GCCTCCACAACTAATATAAA 120 CCCCATGAGG GCAGAGGCGT TCAGCCTGAC TCCAGCCTGG CAAAGCTGTCACAAATCTGG 180 AGGAACACAC ACGTTCACGG GCACTCAGTT CTGTGAGCCT CGCCGCTCCTGCTATTTGCC 240 AACCGCCGGG ACGTACGGCT GGTGGACGCC GGCGGAGTCA AGCTGGAGTCCACCATCGTG 300 GTCAGCGGCC TGGAGGATGC GGCCGCAGTG GACTTCCAGT TTTCCAAGGGAGCCGTGTAC 360 TGGACAGACG TGAGCGAGGA GGCCATCAAG CAGACCTACC TGAACCAGACGGGGGCCGCC 420 GTGCAGAACG TGGTCATCTC CGGCCTGGTC TCTCCCGACG GCCTCGCCTGCGACTGGGTG 480 GGCAAGAAGC TGTACTGGAC GGACTCAGAG ACCAACCGCA TCGAGGTGGCCAACCTCAAT 540 GGCACATCCC GGAAGGTGCT CTTCTGGCAG GACCTTGACC AGCCGAGGGCCATCGCCTTG 600 GACCCCGCTC ACGGGTACAT GTACTGGACA GACTGGGGTG AGACGCCCCGGATTGAGCGG 660 GCAGGGATGG ATGGCAGCAC CCGGAAGATC ATTGTGGACT CGGACATTTACTGGCCCAAT 720 GGACTGACCA TCGACCTGGA GGAGCAGAAG CTCTACTGGG CTGACGCCAAGCTCAGCTTC 780 ATCCACCGTG CCAACCTGGA CGGCTCGTTC CGGCAGAAGG TGGTGGAGGGCAGCCTGACG 840 CACCCCTTCG CCCTGACGCT CTCCGGGGAC ACTCTGTACT GGACAGACTGGCAGACCCGC 900 TCCATCCATG CCTGCAACAA GCGCACTGGG GGGAAGAGGA AGGAGATCCTGAGTGCCCTC 960 TACTCACCCA TGGACATCCA GGTGCTGAGC CAGGAGCGGC AGCCTTTCTTCCACACTCGC 1020 TGTGAGGAGG ACAATGGCGG CTGCTCCCAC CTGTGCCTGC TGTCCCCAAGCGAGCCTTTC 1080 TACACATGCG CCTGCCCCAC GGGTGTGCAG CTGCAGGACA ACGGCAGGACGTGTAAGGCA 1140 GGAGCCGAGG AGGTGCTGCT GCTGGCCCGG CGGACGGACC TACGGAGGATCTCGCTGGAC 1200 ACGCCGGACT TTACCGACAT CGTGCTGCAG GTGGACGACA TCCGGCACGCCATTGCCATC 1260 GACTACGACC CGCTAGAGGG CTATGTCTAC TGGACAGATG ACGAGGTGCGGGCCATCCGC 1320 AGGGCGTACC TGGACGGGTC TGGGGCGCAG ACGCTGGTCA ACACCGAGATCAACGACCCC 1380 GATGGCATCG CGGTCGACTG GGTGGCCCGA AACCTCTACT GGACCGACACGGGCACGGAC 1440 CGCATCGAGG TGACGCGCCT CAACGGCACC TCCCGCAAGA TCCTGGTGTCGGAGGACCTG 1500 GACGAGCCCC GAGCCATCGC ACTGCACCCC GTGATGGGCC TCATGTACTGGACAGACTGG 1560 GGAGAGAACC CTAAAATCGA GTGTGCCAAC TTGGATGGGC AGGAGCGGCGTGTGCTGGTC 1620 AATGCCTCCC TCGGGTGGCC CAACGGCCTG GCCCTGGACC TGCAGGAGGGGAAGCTCTAC 1680 TGGGGAGACG CCAAGACAGA CAAGATCGAG GTGATCAATG TTGATGGGACGAAGAGGCGG 1740 ACCCTCCTGG AGGACAAGCT CCCGCACATT TTCGGGTTCA CGCTGCTGGGGGACTTCATC 1800 TACTGGACTG ACTGGCAGCG CCGCAGCATC GAGCGGGTGC ACAAGGTCAAGGCCAGCCGG 1860 GACGTCATCA TTGACCAGCT GCCCGACCTG ATGGGGCTCA AAGCTGTGAATGTGGCCAAG 1920 GTCGTCGGAA CCAACCCGTG TGCGGACAGG AACGGGGGGT GCAGCCACCTGTGCTTCTTC 1980 ACACCCCACG CAACCCGGTG TGGCTGCCCC ATCGGCCTGG AGCTGCTGAGTGACATGAAG 2040 ACCTGCATGT GCCTGAGGCC TTCTTGGTCT TCACCAGCAG AGCCGCCATCCACAGGATCT 2100 CCCTCGAGAC CAATAACAAC GACGTGGCCA TCCCGCTCAC GGGCGTCAAGGAGGCCTCAG 2160 CCCTGGACTT TGATGTGTCC AACAACCACA TCTACTGGAC AGACGTCAGCCTGAAGACCA 2220 TCAGCCGCGC CTTCATGAAC GGGAGCTCGG TGGAGCACGT GGTGGAGTTTGGCCTTGACT 2280 ACCCCGAGGG CATGGCCGTT GACTGGATGG GCAAGAACCT CTACTGGGCCGACACTGGGA 2340 CCAACAGAAT CGAAGTGGCG CGGCTGGACG GGCAGTTCCG GCAAGTCCTCGTGTGGAGGG 2400 ACTTGGACAA CCCGAGGTCG CTGGCCCTGG ATCCCACCAA GGGCTACATCTACTGGACCG 2460 AGTGGGGCGG CAAGCCGAGG ATCGTGCGGG CCTTCATGGA CGGGACCAACTGCATGACGC 2520 TGGTGGACAA GGTGGGCCGG GCCAACGACC TCACCATTGA CTACGCTGACCAGCGCCTCT 2580 ACTGGACCGA CCTGGACACC AACATGATCG AGTCGTCCAA CATGCTGGGTCAGGAGCGGG 2640 TCGTGATTGC CGACGATCTC CCGCACCCGT TCGGTCTGAC GCAGTACAGCGATTATATCT 2700 ACTGGACAGA CTGGAATCTG CACAGCATTG AGCGGGCCGA CAAGACTAGCGGCCGGAACC 2760 GCACCCTCAT CCAGGGCCAC CTGGACTTCG TGATGGACAT CCTGGTGTTCCACTCCTCCC 2820 GCCAGGATGG CCTCAATGAC TGTATGCACA ACAACGGGCA GTGTGGGCAGCTGTGCCTTG 2880 CCATCCCCGG CGGCCACCGC TGCGGCTGCG CCTCACACTA CACCCTGGACCCCAGCAGCC 2940 GCAACTGCAG CCCGCCCACC ACCTTCTTGC TGTTCAGCCA GAAATCTGCCATCAGTCGGA 3000 TGATCCCGGA CGACCAGCAC AGCCCGGATC TCATCCTGCC CCTGCATGGACTGAGGAACG 3060 TCAAAGCCAT CGACTATGAC CCACTGGACA AGTTCATCTA CTGGGTGGATGGGCGCCAGA 3120 ACATCAAGCG AGCCAAGGAC GACGGGACCC AGCCCTTTGT TTTGACCTCTCTGAGCCAAG 3180 GCCAAAACCC AGACAGGCAG CCCCACGACC TCAGCATCGA CATCTACAGCCGGACACTGT 3240 TCTGGACGTG CGAGGCCACC AATACCATCA ACGTCCACAG GCTGAGCGGGGAAGCCATGG 3300 GGGTGGTGCT GCGTGGGGAC CGCGACAAGC CCAGGGCCAT CGTCGTCAACGCGGAGCGAG 3360 GGTACCTGTA CTTCACCAAC ATGCAGGACC GGGCAGCCAA GATCGAACGCGCAGCCCTGG 3420 ACGGCACCGA GCGCGAGGTC CTCTTCACCA CCGGCCTCAT CCGCCCTGTGGCCCTGGTGG 3480 TAGACAACAC ACTGGGCAAG CTGTTCTGGG TGGACGCGGA CCTGAAGCGCATTGAGAGCT 3540 GTGACCTGTC AGGGGCCAAC CGCCTGACCC TGGAGGACGC CAACATCGTGCAGCCTCTGG 3600 GCCTGACCAT CCTTGGCAAG CATCTCTACT GGATCGACCG CCAGCAGCAGATGATCGAGC 3660 GTGTGGAGAA GACCACCGGG GACAAGCGGA CTCGCATCCA GGGCCGTGTCGCCCACCTCA 3720 CTGGCATCCA TGCAGTGGAG GAAGTCAGCC TGGAGGAGTT CTCAGCCCACCCATGTGCCC 3780 GTGACAATGG TGGCTGCTCC CACATCTGTA TTGCCAAGGG TGATGGGACACCACGGTGCT 3840 CATGCCCAGT CCACCTCGTG CTCCTGCAGA ACCTGCTGAC CTGTGGAGAGCCGCCCACCT 3900 GCTCCCCGGA CCAGTTTGCA TGTGCCACAG GGGAGATCGA CTGTATCCCCGGGGCCTGGC 3960 GCTGTGACGG CTTTCCCGAG TGCGATGACC AGAGCGACGA GGAGGGCTGCCCCGTGTGCT 4020 CCGCCGCCCA GTTCCCCTGC GCGCGGGGTC AGTGTGTGGA CCTGCGCCTGCGCTGCGACG 4080 GCGAGGCAGA CTGTCAGGAC CGCTCAGACA GGCGGACTGT GACGCCATCTGCCTGCCCAA 4140 CCAGTTCCGG TGTGCGAGCG GCCAGTGTGT CCTCATCAAA CAGCAGTGCGACTCCTTCCC 4200 CGACTGTATC GACGGCTCCG ACGAGCTCAT GTGTGAAATC ACCAAGCCGCCCTCAGACGA 4260 CAGCCCGGCC CACAGCAGTG CCATCGGGCC CGTCATTGGC ATCATCCTCTCTCTCTTCGT 4320 CATGGGTGGT GTCTATTTTG TGTGCCAGCG CGTGGTGTGC CAGCGCTATGCGGGGGCCAA 4380 CGGGCCCTTC CCGCACGAGT ATGTCAGCGG GACCCCGCAC GTGCCCCTCAATTTCATAGC 4440 CCCGGGCGGT TCCCAGCATG GCCCCTTCAC AGGCATCGCA TGCGGAAAGTCCATGATGAG 4500 CTCCGTGAGC CTGATGGGGG GCCGGGGCGG GGTGCCCCTC TACGACCGGAACCACGTCAC 4560 AGGGGCCTCG TCCAGCAGCT CGTCCAGCAC GAAGGCCACG CTGTACCCGCCGATCCTGAA 4620 CCCGCCGCCC TCCCCGGCCA CGGACCCCTC CCTGTACAAC ATGGACATGTTCTACTCTTC 4680 AAACATTCCG GCCACTGTGA GACCGTACAG GCCCTACATC ATTCGAGGAATGGCGCCCCC 4740 GACGACGCCC TGCAGCACCG ACGTGTGTGA CAGCGACTAC AGCGCCAGCCGCTGGAAGGC 4800 CAGCAAGTAC TACCTGGATT TGAACTCGGA CTCAGACCCC TATCCACCCCCACCCACGCC 4860 CCACAGCCAG TACCTGTCGG CGGAGGACAG CTGCCCGCCC TCGCCCGCCACCGAGAGGAG 4920 CTACTTCCAT CTCTTCCCGC CCCCTCCGTC CCCCTGCACG GACTCATCCTGACCTCGGCC 4980 GGGCCACTCT GGCTTCTCTG TGCCCCTGTA AATAGTTTTA AATATGAACAAAGAAAAAAA 5040 TATATTTTAT GATTTAAAAA ATAAATATAA TTGGGATTTT AAAAACATGAGAAATGTGAA 5100 CTGTGATGGG GTGGGCAGGG CTGGGAGAAC TTTGTACAGT GGAACAAATATTTATAAACT 5160 TAATTT 5166 4351 base pairs nucleic acid single linear24 ATGTACTGGA CAGACTGGGG TGAGACGCCC CGGATTGAGC GGGCAGGGAT GGATGGCAGC 60ACCCGGAAGA TCATTGTGGA CTCGGACATT TACTGGCCCA ATGGACTGAC CATCGACCTG 120GAGGAGCAGA AGCTCTACTG GGCTGACGCC AAGCTCAGCT TCATCCACCG TGCCAACCTG 180GACGGCTCGT TCCGGCAGAA GGTGGTGGAG GGCAGCCTGA CGCACCCCTT CGCCCTGACG 240CTCTCCGGGG ACACTCTGTA CTGGACAGAC TGGCAGACCC GCTCCATCCA TGCCTGCAAC 300AAGCGCACTG GGGGGAAGAG GAAGGAGATC CTGAGTGCCC TCTACTCACC CATGGACATC 360CAGGTGCTGA GCCAGGAGCG GCAGCCTTTC TTCCACACTC GCTGTGAGGA GGACAATGGC 420GGCTGCTCCC ACCTGTGCCT GCTGTCCCCA AGCGAGCCTT TCTACACATG CGCCTGCCCC 480ACGGGTGTGC AGCTGCAGGA CAACGGCAGG ACGTGTAAGG CAGGAGCCGA GGAGGTGCTG 540CTGCTGGCCC GGCGGACGGA CCTACGGAGG ATCTCGCTGG ACACGCCGGA CTTTACCGAC 600ATCGTGCTGC AGGTGGACGA CATCCGGCAC GCCATTGCCA TCGACTACGA CCCGCTAGAG 660GGCTATGTCT ACTGGACAGA TGACGAGGTG CGGGCCATCC GCAGGGCGTA CCTGGACGGG 720TCTGGGGCGC AGACGCTGGT CAACACCGAG ATCAACGACC CCGATGGCAT CGCGGTCGAC 780TGGGTGGCCC GAAACCTCTA CTGGACCGAC ACGGGCACGG ACCGCATCGA GGTGACGCGC 840CTCAACGGCA CCTCCCGCAA GATCCTGGTG TCGGAGGACC TGGACGAGCC CCGAGCCATC 900GCACTGCACC CCGTGATGGG CCTCATGTAC TGGACAGACT GGGGAGAGAA CCCTAAAATC 960GAGTGTGCCA ACTTGGATGG GCAGGAGCGG CGTGTGCTGG TCAATGCCTC CCTCGGGTGG 1020CCCAACGGCC TGGCCCTGGA CCTGCAGGAG GGGAAGCTCT ACTGGGGAGA CGCCAAGACA 1080GACAAGATCG AGGTGATCAA TGTTGATGGG ACGAAGAGGC GGACCCTCCT GGAGGACAAG 1140CTCCCGCACA TTTTCGGGTT CACGCTGCTG GGGGACTTCA TCTACTGGAC TGACTGGCAG 1200CGCCGCAGCA TCGAGCGGGT GCACAAGGTC AAGGCCAGCC GGGACGTCAT CATTGACCAG 1260CTGCCCGACC TGATGGGGCT CAAAGCTGTG AATGTGGCCA AGGTCGTCGG AACCAACCCG 1320TGTGCGGACA GGAACGGGGG GTGCAGCCAC CTGTGCTTCT TCACACCCCA CGCAACCCGG 1380TGTGGCTGCC CCATCGGCCT GGAGCTGCTG AGTGACATGA AGACCTGCAT CGTGCCTGAG 1440GCCTTCTTGG TCTTCACCAG CAGAGCCGCC ATCCACAGGA TCTCCCTCGA GACCAATAAV 1500AACGACGTGG CCATCCCGCT CACGGGCGTC AAGGAGGCCT CAGCCCTGGA CTTTGATGTG 1560TCCAACAACC ACATCTACTG GACAGACGTC AGCCTGAAGA CCATCAGCCG CGCCTTCATG 1620AACGGGAGCT CGGTGGAGCA CGTGGTGGAG TTTGGCCTTG ACTACCCCGA GGGCATGGCC 1680GTTGACTGGA TGGGCAAGAA CCTCTACTGG GCCGACACTG GGACCAACAG AATCGAAGTG 1740GCGCGGCTGG ACGGGCAGTT CCGGCAAGTC CTCGTGTGGA GGGACTTGGA CAACCCGAGG 1800TCGCTGGCCC TGGATCCCAC CAAGGGCTAC ATCTACTGGA CCGAGTGGGG CGGCAAGCCG 1860AGGATCGTGC GGGCCTTCAT GGACGGGACC AACTGCATGA CGCTGGTGGA CAAGGTGGGC 1920CGGGCCAACG ACCTCACCAT TGACTACGCT GACCAGCGCC TCTACTGGAC CGACCTGGAC 1980ACCAACATGA TCGAGTCGTC CAACATGCTG GGTCAGGAGC GGGTCGTGAT TGCCGACGAT 2040CTCCCGCACC GTTCGGTCTG ACGCAGTACA GCGATTATAT CTACTGGACA GACTGGAATC 2100TGCACAGCAT TGAGCGGGCC GACAAGACTA GCGGCCGGAA CCGCACCCTC ATCCAGGGCC 2160ACCTGGACTT CGTGATGGAC ATCCTGGTGT TCCACTCCTC CCGCCAGGAT GGCCTCAATG 2220ACTGTATGCA CAACAACGGG CAGTGTGGGC AGCTGTGCCT TGCCATCCCC GGCGGCCACC 2280GCTGCGGCTG CGCCTCACAC TACACCCTGG ACCCCAGCAG CCGCAACTGC AGCCCGCCCA 2340CCACCTTCTT GCTGTTCAGC CAGAAATCTG CCATCAGTCG GATGATCCCG GACGACCAGC 2400ACAGCCCGGA TCTCATCCTG CCCCTGCATG GACTGAGGAA CGTCAAAGCC ATCGACTATG 2460ACCCACTGGA CAAGTTCATC TACTGGGTGG ATGGGCGCCA GAACATCAAG CGAGCCAAGG 2520ACGACGGGAC CCAGCCCTTT GTTTTGACCT CTCTGAGCCA AGGCCAAAAC CCAGACAGGC 2580AGCCCCACGA CCTCAGCATC GACATCTACA GCCGGACACT GTTCTGGACG TGCGAGGCCA 2640CCAATACCAT CAACGTCCAC AGGCTGAGCG GGGAAGCCAT GGGGGTGGTG CTGCGTGGGG 2700ACCGCGACAA GCCCAGGGCC ATCGTCGTCA ACGCGGAGCG AGGGTACCTG TACTTCACCA 2760ACATGCAGGA CCGGGCAGCC AAGATCGAAC GCGCAGCCCT GGACGGCACC GAGCGCGAGG 2820TCCTCTTCAC CACCGGCCTC ATCCGCCCTG TGGCCCTGGT GGTAGACAAC ACACTGGGCA 2880AGCTGTTCTG GGTGGACGCG GACCTGAAGC GCATTGAGAG CTGTGACCTG TCAGGGGCCA 2940ACCGCCTGAC CCTGGAGGAC GCCAACATCG TGCAGCCTCT GGGCCTGACC ATCCTTGGCA 3000AGCATCTCTA CTGGATCGAC CGCCAGCAGC AGATGATCGA GCGTGTGGAG AAGACCACCG 3060GGGACAAGCG GACTCGCATC CAGGGCCGTG TCGCCCACCT CACTGGCATC CATGCAGTGG 3120AGGAAGTCAG CCTGGAGGAG TTCTCAGCCC ACCCATGTGC CCGTGACAAT GGTGGCTGCT 3180CCCACATCTG TATTGCCAAG GGTGATGGGA CACCACGGTG CTCATGCCCA GTCCACCTCG 3240TGCTCCTGCA GAACCTGCTG ACCTGTGGAG AGCCGCCCAC CTGCTCCCCG GACCAGTTTG 3300CATGTGCCAC AGGGGAGATC GACTGTATCC CCGGGGCCTG GCGCTGTGAC GGCTTTCCCG 3360AGTGCGATGA CCAGAGCGAC GAGGAGGGCT GCCCCGTGTG CTCCGCCGCC CAGTTCCCCT 3420GCGCGCGGGG TCAGTGTGTG GACCTGCGCC TGCGCTGCGA CGGCGAGGCA GACTGTCAGG 3480ACCGCTCAGA CGAGGCGGAC TGTGACGCCA TCTGCCTGCC CAACCAGTTC CGGTGTGCGA 3540GCGGCCAGTG TGTCCTCATC AAACAGCAGT GCGACTCCTT CCCCGACTGT ATCGACGGCT 3600CCGACGAGCT CATGTGTGAA ATCACCAAGC CGCCCTCAGA CGACAGCCCG GCCCACAGCA 3660GTGCCATCGG GCCCGTCATT GGCATCATCC TCTCTCTCTT CGTCATGGGT GGTGTCTATT 3720TTGTGTGCCA GCGCGTGGTG TGCCAGCGCT ATGCGGGGGC CAACGGGCCC TTCCCGCACG 3780AGTATGTCAG CGGGACCCCG CACGTGCCCC TCAATTTCAT AGCCCCGGGC GGTTCCCAGC 3840ATGGCCCCTT CACAGGCATC GCATGCGGAA AGTCCATGAT GAGCTCCGTG AGCCTGATGG 3900GGGGCCGGGG CGGGGTGCCC CTCTACGACC GGAACCACGT CACAGGGGCC TCGTCCAGCA 3960GCTCGTCCAG CACGAAGGCC ACGCTGTACC CGCCGATCCT GAACCCGCCG CCCTCCCCGG 4020CCACGGACCC CTCCCTGTAC AACATGGACA TGTTCTACTC TTCAAACATT CCGGCCACTG 4080TGAGACCGTA CAGGCCCTAC ATCATTCGAG AATGGCGCCC CCGACGACGC CCTGCAGCAC 4140CGACGTGTGT GACAGCGACT ACAGCGCCAG CCGCTGGAAG GCCAGCAAGT ACTACCTGGA 4200TTTGAACTCG GACTCAGACC CCTATCCACC CCCACCCACG CCCCACAGCC AGTACCTGTC 4260GGCGGAGGAC AGCTGCCCGC CCTCGCCCGC CACCGAGAGG AGCTACTTCC ATCTCTTCCC 4320GCCCCCTCCG TCCCCCTGCA CGGACTCATC C 4351 1451 amino acids amino acidlinear 25 Met Tyr Trp Thr Asp Trp Gly Glu Thr Pro Arg Ile Glu Arg AlaGly 1 5 10 15 Met Asp Gly Ser Thr Arg Lys Ile Ile Val Asp Ser Asp IleTyr Trp 20 25 30 Pro Asn Gly Leu Thr Ile Asp Leu Glu Glu Gln Lys Leu TyrTrp Ala 35 40 45 Asp Ala Lys Leu Ser Phe Ile His Arg Ala Asn Leu Asp GlySer Phe 50 55 60 Arg Gln Lys Val Val Glu Gly Ser Leu Thr His Pro Phe AlaLeu Thr 65 70 75 80 Leu Ser Gly Asp Thr Leu Tyr Trp Thr Asp Trp Gln ThrArg Ser Ile 85 90 95 His Ala Cys Asn Lys Arg Thr Gly Gly Lys Arg Lys GluIle Leu Ser 100 105 110 Ala Leu Tyr Ser Pro Met Asp Ile Gln Val Leu SerGln Glu Arg Gln 115 120 125 Pro Phe Phe His Thr Arg Cys Glu Glu Asp AsnGly Gly Cys Ser His 130 135 140 Leu Cys Leu Leu Ser Pro Ser Glu Pro PheTyr Thr Cys Ala Cys Pro 145 150 155 160 Thr Gly Val Gln Leu Gln Asp AsnGly Arg Thr Cys Lys Ala Gly Ala 165 170 175 Glu Glu Val Leu Leu Leu AlaArg Arg Thr Asp Leu Arg Arg Ile Ser 180 185 190 Leu Asp Thr Pro Asp PheThr Asp Ile Val Leu Gln Val Asp Asp Ile 195 200 205 Arg His Ala Ile AlaIle Asp Tyr Asp Pro Leu Glu Gly Tyr Val Tyr 210 215 220 Trp Thr Asp AspGlu Val Arg Ala Ile Arg Arg Ala Tyr Leu Asp Gly 225 230 235 240 Ser GlyAla Gln Thr Leu Val Asn Thr Glu Ile Asn Asp Pro Asp Gly 245 250 255 IleAla Val Asp Trp Val Ala Arg Asn Leu Tyr Trp Thr Asp Thr Gly 260 265 270Thr Asp Arg Ile Glu Val Thr Arg Leu Asn Gly Thr Ser Arg Lys Ile 275 280285 Leu Val Ser Glu Asp Leu Asp Glu Pro Arg Ala Ile Ala Leu His Pro 290295 300 Val Met Gly Leu Met Tyr Trp Thr Asp Trp Gly Glu Asn Pro Lys Ile305 310 315 320 Glu Cys Ala Asn Leu Asp Gly Gln Glu Arg Arg Val Leu ValAsn Ala 325 330 335 Ser Leu Gly Trp Pro Asn Gly Leu Ala Leu Asp Leu GlnGlu Gly Lys 340 345 350 Leu Tyr Trp Gly Asp Ala Lys Thr Asp Lys Ile GluVal Ile Asn Val 355 360 365 Asp Gly Thr Lys Arg Arg Thr Leu Leu Glu AspLys Leu Pro His Ile 370 375 380 Phe Gly Phe Thr Leu Leu Gly Asp Phe IleTyr Trp Thr Asp Trp Gln 385 390 395 400 Arg Arg Ser Ile Glu Arg Val HisLys Val Lys Ala Ser Arg Asp Val 405 410 415 Ile Ile Asp Gln Leu Pro AspLeu Met Gly Leu Lys Ala Val Asn Val 420 425 430 Ala Lys Val Val Gly ThrAsn Pro Cys Ala Asp Arg Asn Gly Gly Cys 435 440 445 Ser His Leu Cys PhePhe Thr Pro His Ala Thr Arg Cys Gly Cys Pro 450 455 460 Ile Gly Leu GluLeu Leu Ser Asp Met Lys Thr Cys Ile Val Pro Glu 465 470 475 480 Ala PheLeu Val Phe Thr Ser Arg Ala Ala Ile His Arg Ile Ser Leu 485 490 495 GluThr Asn Asn Asn Asp Val Ala Ile Pro Leu Thr Gly Val Lys Glu 500 505 510Ala Ser Ala Leu Asp Phe Asp Val Ser Asn Asn His Ile Tyr Trp Thr 515 520525 Asp Val Ser Leu Lys Thr Ile Ser Arg Ala Phe Met Asn Gly Ser Ser 530535 540 Val Glu His Val Val Glu Phe Gly Leu Asp Tyr Pro Glu Gly Met Ala545 550 555 560 Val Asp Trp Met Gly Lys Asn Leu Tyr Trp Ala Asp Thr GlyThr Asn 565 570 575 Arg Ile Glu Val Ala Arg Leu Asp Gly Gln Phe Arg GlnVal Leu Val 580 585 590 Trp Arg Asp Leu Asp Asn Pro Arg Ser Leu Ala LeuAsp Pro Thr Lys 595 600 605 Gly Tyr Ile Tyr Trp Thr Glu Trp Gly Gly LysPro Arg Ile Val Arg 610 615 620 Ala Phe Met Asp Gly Thr Asn Cys Met ThrLeu Val Asp Lys Val Gly 625 630 635 640 Arg Ala Asn Asp Leu Thr Ile AspTyr Ala Asp Gln Arg Leu Tyr Trp 645 650 655 Thr Asp Leu Asp Thr Asn MetIle Glu Ser Ser Asn Met Leu Gly Gln 660 665 670 Glu Arg Val Val Ile AlaAsp Asp Leu Pro His Pro Phe Gly Leu Thr 675 680 685 Gln Tyr Ser Asp TyrIle Tyr Trp Thr Asp Trp Asn Leu His Ser Ile 690 695 700 Glu Arg Ala AspLys Thr Ser Gly Arg Asn Arg Thr Leu Ile Gln Gly 705 710 715 720 His LeuAsp Phe Val Met Asp Ile Leu Val Phe His Ser Ser Arg Gln 725 730 735 AspGly Leu Asn Asp Cys Met His Asn Asn Gly Gln Cys Gly Gln Leu 740 745 750Cys Leu Ala Ile Pro Gly Gly His Arg Cys Gly Cys Ala Ser His Tyr 755 760765 Thr Leu Asp Pro Ser Ser Arg Asn Cys Ser Pro Pro Thr Thr Phe Leu 770775 780 Leu Phe Ser Gln Lys Ser Ala Ile Ser Arg Met Ile Pro Asp Asp Gln785 790 795 800 His Ser Pro Asp Leu Ile Leu Pro Leu His Gly Leu Arg AsnVal Lys 805 810 815 Ala Ile Asp Tyr Asp Pro Leu Asp Lys Phe Ile Tyr TrpVal Asp Gly 820 825 830 Arg Gln Asn Ile Lys Arg Ala Lys Asp Asp Gly ThrGln Pro Phe Val 835 840 845 Leu Thr Ser Leu Ser Gln Gly Gln Asn Pro AspArg Gln Pro His Asp 850 855 860 Leu Ser Ile Asp Ile Tyr Ser Arg Thr LeuPhe Trp Thr Cys Glu Ala 865 870 875 880 Thr Asn Thr Ile Asn Val His ArgLeu Ser Gly Glu Ala Met Gly Val 885 890 895 Val Leu Arg Gly Asp Arg AspLys Pro Arg Ala Ile Val Val Asn Ala 900 905 910 Glu Arg Gly Tyr Leu TyrPhe Thr Asn Met Gln Asp Arg Ala Ala Lys 915 920 925 Ile Glu Arg Ala AlaLeu Asp Gly Thr Glu Arg Glu Val Leu Phe Thr 930 935 940 Thr Gly Leu IleArg Pro Val Ala Leu Val Val Asp Asn Thr Leu Gly 945 950 955 960 Lys LeuPhe Trp Val Asp Ala Asp Leu Lys Arg Ile Glu Ser Cys Asp 965 970 975 LeuSer Gly Ala Asn Arg Leu Thr Leu Glu Asp Ala Asn Ile Val Gln 980 985 990Pro Leu Gly Leu Thr Ile Leu Gly Lys His Leu Tyr Trp Ile Asp Arg 995 10001005 Gln Gln Gln Met Ile Glu Arg Val Glu Lys Thr Thr Gly Asp Lys Arg1010 1015 1020 Thr Arg Ile Gln Gly Arg Val Ala His Leu Thr Gly Ile HisAla Val 1025 1030 1035 1040 Glu Glu Val Ser Leu Glu Glu Phe Ser Ala HisPro Cys Ala Arg Asp 1045 1050 1055 Asn Gly Gly Cys Ser His Ile Cys IleAla Lys Gly Asp Gly Thr Pro 1060 1065 1070 Arg Cys Ser Cys Pro Val HisLeu Val Leu Leu Gln Asn Leu Leu Thr 1075 1080 1085 Cys Gly Glu Pro ProThr Cys Ser Pro Asp Gln Phe Ala Cys Ala Thr 1090 1095 1100 Gly Glu IleAsp Cys Ile Pro Gly Ala Trp Arg Cys Asp Gly Phe Pro 1105 1110 1115 1120Glu Cys Asp Asp Gln Ser Asp Glu Glu Gly Cys Pro Val Cys Ser Ala 11251130 1135 Ala Gln Phe Pro Cys Ala Arg Gly Gln Cys Val Asp Leu Arg LeuArg 1140 1145 1150 Cys Asp Gly Glu Ala Asp Cys Gln Asp Arg Ser Asp GluAla Asp Cys 1155 1160 1165 Asp Ala Ile Cys Leu Pro Asn Gln Phe Arg CysAla Ser Gly Gln Cys 1170 1175 1180 Val Leu Ile Lys Gln Gln Cys Asp SerPhe Pro Asp Cys Ile Asp Gly 1185 1190 1195 1200 Ser Asp Glu Leu Met CysGlu Ile Thr Lys Pro Pro Ser Asp Asp Ser 1205 1210 1215 Pro Ala His SerSer Ala Ile Gly Pro Val Ile Gly Ile Ile Leu Ser 1220 1225 1230 Leu PheVal Met Gly Gly Val Tyr Phe Val Cys Gln Arg Val Val Cys 1235 1240 1245Gln Arg Tyr Ala Gly Ala Asn Gly Pro Phe Pro His Glu Tyr Val Ser 12501255 1260 Gly Thr Pro His Val Pro Leu Asn Phe Ile Ala Pro Gly Gly SerGln 1265 1270 1275 1280 His Gly Pro Phe Thr Gly Ile Ala Cys Gly Lys SerMet Met Ser Ser 1285 1290 1295 Val Ser Leu Met Gly Gly Arg Gly Gly ValPro Leu Tyr Asp Arg Asn 1300 1305 1310 His Val Thr Gly Ala Ser Ser SerSer Ser Ser Ser Thr Lys Ala Thr 1315 1320 1325 Leu Tyr Pro Pro Ile LeuAsn Pro Pro Pro Ser Pro Ala Thr Asp Pro 1330 1335 1340 Ser Leu Tyr AsnMet Asp Met Phe Tyr Ser Ser Asn Ile Pro Ala Thr 1345 1350 1355 1360 ValArg Pro Tyr Arg Pro Tyr Ile Ile Arg Gly Met Ala Pro Pro Thr 1365 13701375 Thr Pro Cys Ser Thr Asp Val Cys Asp Ser Asp Tyr Ser Ala Ser Arg1380 1385 1390 Trp Lys Ala Ser Lys Tyr Tyr Leu Asp Leu Asn Ser Asp SerAsp Pro 1395 1400 1405 Tyr Pro Pro Pro Pro Thr Pro His Ser Gln Tyr LeuSer Ala Glu Asp 1410 1415 1420 Ser Cys Pro Pro Ser Pro Ala Thr Glu ArgSer Tyr Phe His Leu Phe 1425 1430 1435 1440 Pro Pro Pro Pro Ser Pro CysThr Asp Ser Ser 1445 1450 5125 base pairs nucleic acid single linear 26TAAATGGCTT GGCAAAGGGA GTTCATTCCT TTTAGCGCTT CCATCTTCTG CAGTGAGAGG 60ACACCGCATT CTTCTTCTCC AGAGGATGCA GCAGCAAGGC GCCATCTTGA AACCAGAGAC 120CAAACCAACC AGCAACTTCG TCTTGAACTT CCCAGCCTCC ACAACTCCTC GCCGCTCCTG 180CTATTTGCCA ACCGCCGGGA CGTACGGCTG GTGGACGCCG GCGGAGTCAA GCTGGAGTCC 240ACCATCGTGG TCAGCGGCCT GGAGGATGCG GCCGCAGTGG ACTTCCAGTT TTCCAAGGGA 300GCCGTGTACT GGACAGACGT GAGCGAGGAG GCCATCAAGC AGACCTACCT GAACCAGACG 360GGGGCCGCCG TGCAGAACGT GGTCATCTCC GGCCTGGTCT CTCCCGACGG CCTCGCCTGC 420GACTGGGTGG GCAAGAAGCT GTACTGGACG GACTCAGAGA CCAACCGCAT CGAGGTGGCC 480AACCTCAATG GCACATCCCG GAAGGTGCTC TTCTGGCAGG ACCTTGACCA GCCGAGGGCC 540ATCGCCTTGG ACCCCGCTCA CGGGTACATG TACTGGACAG ACTGGGGTGA GACGCCCCGG 600ATTGAGCGGG CAGGGATGGA TGGCAGCACC CGGAAGATCA TTGTGGACTC GGACATTTAC 660TGGCCCAATG GACTGACCAT CGACCTGGAG GAGCAGAAGC TCTACTGGGC TGACGCCAAG 720CTCAGCTTCA TCCACCGTGC CAACCTGGAC GGCTCGTTCC GGCAGAAGGT GGTGGAGGGC 780AGCCTGACGC ACCCCTTCGC CCTGACGCTC TCCGGGGACA CTCTGTACTG GACAGACTGG 840CAGACCCGCT CCATCCATGC CTGCAACAAG CGCACTGGGG GGAAGAGGAA GGAGATCCTG 900AGTGCCCTCT ACTCACCCAT GGACATCCAG GTGCTGAGCC AGGAGCGGCA GCCTTTCTTC 960CACACTCGCT GTGAGGAGGA CAATGGCGGC TGCTCCCACC TGTGCCTGCT GTCCCCAAGC 1020GAGCCTTTCT ACACATGCGC CTGCCCCACG GGTGTGCAGC TGCAGGACAA CGGCAGGACG 1080TGTAAGGCAG GAGCCGAGGA GGTGCTGCTG CTGGCCCGGC GGACGGACCT ACGGAGGATC 1140TCGCTGGACA CGCCGGACTT TACCGACATC GTGCTGCAGG TGGACGACAT CCGGCACGCC 1200ATTGCCATCG ACTACGACCC GCTAGAGGGC TATGTCTACT GGACAGATGA CGAGGTGCGG 1260GCCATCCGCA GGGCGTACCT GGACGGGTCT GGGGCGCAGA CGCTGGTCAA CACCGAGATC 1320AACGACCCCG ATGGCATCGC GGTCGACTGG GTGGCCCGAA ACCTCTACTG GACCGACACG 1380GGCACGGACC GCATCGAGGT GACGCGCCTC AACGGCACCT CCCGCAAGAT CCTGGTGTCG 1440GAGGACCTGG ACGAGCCCCG AGCCATCGCA CTGCACCCCG TGATGGGCCT CATGTACTGG 1500ACAGACTGGG GAGAGAACCC TAAAATCGAG TGTGCCAACT TGGATGGGCA GGAGCGGCGT 1560GTGCTGGTCA ATGCCTCCCT CGGGTGGCCC AACGGCCTGG CCCTGGACCT GCAGGAGGGG 1620AAGCTCTACT GGGGAGACGC CAAGACAGAC AAGATCGAGG TGATCAATGT TGATGGGACG 1680AAGAGGCGGA CCCTCCTGGA GGACAAGCTC CCGCACATTT TCGGGTTCAC GCTGCTGGGG 1740GACTTCATCT ACTGGACTGA CTGGCAGCGC CGCAGCATCG AGCGGGTGCA CAAGGTCAAG 1800GCCAGCCGGG ACGTCATCAT TGACCAGCTG CCCGACCTGA TGGGGCTCAA AGCTGTGAAT 1860GTGGCCAAGG TCGTCGGAAC CAACCCGTGT GCGGACAGGA ACGGGGGGTG CAGCCACCTG 1920TGCTTCTTCA CACCCCACGC AACCCGGTGT GGCTGCCCCA TCGGCCTGGA GCTGCTGAGT 1980GACATGAAGA CCTGCATCGT GCCTGAGGCC TTCTTGGTCT TCACCAGCAG AGCCGCCATC 2040CACAGGATTC CCTCGAGACC AATAACAACG ACGTGGCCAT CCCGCTCACG GGCGTCAAGG 2100AGGCCTCAGC CCTGGACTTT GATGTGTCCA ACAACCACAT CTACTGGACA GACGTCAGCC 2160TGAAGACCAT CAGCCGCGCC TTCATGAACG GGAGCTCGGT GGAGCACGTG GTGGAGTTTG 2220GCCTTGACTA CCCCGAGGGC ATGGCCGTTG ACTGGATGGG CAAGAACCTC TACTGGGCCG 2280ACACTGGGAC CAACAGAATC GAAGTGGCGC GGCTGGACGG GCAGTTCCGG CAAGTCCTCG 2340TGTGGAGGGA CTTGGACAAC CCGAGGTCGC TGGCCCTGGA TCCCACCAAG GGCTACATCT 2400ACTGGACCGA GTGGGGCGGC AAGCCGAGGA TCGTGCGGGC CTTCATGGAC GGGACCAACT 2460GCATGACGCT GGTGGACAAG GTGGGCCGGG CCAACGACCT CACCATTGAC TACGCTGACC 2520AGCGCCTCTA CTGGACCGAC CTGGACACCA ACATGATCGA GTCGTCCAAC ATGCTGGGTC 2580AGGAGCGGGT CGTGATTGCC GACGATCTCC CGCACCCGTT CGGTCTGACG CAGTACAGCG 2640ATTATATCTA CTGGACAGAC TGGAATCTGC ACAGCATTGA GCGGGCCGAC AAGACTAGCG 2700GCCGGAACCG CACCCTCATC CAGGGCCACC TGGACTTCGT GATGGACATC CTGGTGTTCC 2760ACTCCTCCCG CCAGGATGGC CTCAATGACT GTATGCACAA CAACGGGCAG TGTGGGCAGC 2820TGTGCCTTGC CATCCCCGGC GGCCACCGCT GCGGCTGCGC CTCACACTAC ACCCTGGACC 2880CCAGCAGCCG CAACTGCAGC CCGCCCACCA CCTTCTTGCT GTTCAGCCAG AAATCTGCCA 2940TCAGTCGGAT GATCCCGGAC GACCAGCACA GCCCGGATCT CATCCTGCCC CTGCATGGAC 3000TGAGGAACGT CAAAGCCATC GACTATGACC CACTGGACAA GTTCATCTAC TGGGTGGATG 3060GGCGCCAGAA CATCAAGCGA GCCAAGGACG ACGGGACCCA GCCCTTTGTT TTGACCTCTC 3120TGAGCCAAGG CCAAAACCCA GACAGGCAGC CCCACGACCT CAGCATCGAC ATCTACAGCC 3180GGACACTGTT CTGGACGTGC GAGGCCACCA ATACCATCAA CGTCCACAGG CTGAGCGGGG 3240AAGCCATGGG GGTGGTGCTG CGTGGGGACC GCGACAAGCC CAGGGCCATC GTCGTCAACG 3300CGGAGCGAGG GTACCTGTAC TTCACCAACA TGCAGGACCG GGCAGCCAAG ATCGAACGCG 3360CAGCCCTGGA CGGCACCGAG CGCGAGGTCC TCTTCACCAC CGGCCTCATC CGCCCTGTGG 3420CCCTGGTGGT AGACAACACA CTGGGCAAGC TGTTCTGGGT GGACGCGGAC CTGAAGCGCA 3480TTGAGAGCTG TGACCTGTCA GGGGCCAACC GCCTGACCCT GGAGGACGCC AACATCGTGC 3540AGCCTCTGGG CCTGACCATC CTTGGCAAGC ATCTCTACTG GATCGACCGC CAGCAGCAGA 3600TGATCGAGCG TGTGGAGAAG ACCACCGGGG ACAAGCGGAC TCGCATCCAG GGCCGTGTCG 3660CCCACCTCAC TGGCATCCAT GCAGTGGAGG AAGTCAGCCT GGAGGAGTTC TCAGCCCACC 3720CATGTGCCCG TGACAATGGT GGCTGCTCCC ACATCTGTAT TGCCAAGGGT GATGGGACAC 3780CACGGTGCTC ATGCCCAGTC CACCTCGTGC TCCTGCAGAA CCTGCTGACC TGTGGAGAGC 3840CGCCCACCTG CTCCCCGGAC CAGTTTGCAT GTGCCACAGG GGAGATCGAC TGTATCCCCG 3900GGGCCTGGCG CTGTGACGGC TTTCCCGAGT GCGATGACCA GAGCGACGAG GAGGGCTGCC 3960CCGTGTGCTC CGCCGCCCAG TTCCCCTGCG CGCGGGGTCA GTGTGTGGAC CTGCGCCTGC 4020GCTGCGACGG CGAGGCAGAC TGTCAGGACC GCTCAGACGA GGCGGACTGT GACGCCATCT 4080GCCTGCCCAA CCAGTTCCGG TGTGCGAGCG GCAGTGTGTC CTCATCAAAC AGCAGTGCGA 4140CTCCTTCCCC GACTGTATCG ACGGCTCCGA CGAGCTCATG TGTGAAATCA CCAAGCCGCC 4200CTCAGACGAC AGCCCGGCCC ACAGCAGTGC CATCGGGCCC GTCATTGGCA TCATCCTCTC 4260TCTCTTCGTC ATGGGTGGTG TCTATTTTGT GTGCCAGCGC GTGGTGTGCC AGCGCTATGC 4320GGGGGCCAAC GGGCCCTTCC CGCACGAGTA TGTCAGCGGG ACCCCGCACG TGCCCCTCAA 4380TTTCATAGCC CCGGGCGGTT CCCAGCATGG CCCCTTCACA GGCATCGCAT GCGGAAAGTC 4440CATGATGAGC TCCGTGAGCC TGATGGGGGG CCGGGGCGGG GTGCCCCTCT ACGACCGGAA 4500CCACGTCACA GGGGCCTCGT CCAGCAGCTC GTCCAGCACG AAGGCCACGC TGTACCCGCC 4560GATCCTGAAC CCGCCGCCCT CCCCGGCCAC GGACCCCTCC CTGTACAACA TGGACATGTT 4620CTACTCTTCA AACATTCCGG CCACTGCGAG ACCGTACAGG CCCTACATCA TTCGAGGAAT 4680GGCGCCCCCG ACGACGCCCT GCAGCACCGA CGTGTGTGAC AGCGACTACA GCGCCAGCCG 4740CTGGAAGGCC AGCAAGTACT ACCTGGATTT GAACTCGGAC TCAGACCCCT ATCCACCCCC 4800ACCCACGCCC CACAGCCAGT ACCTGTCGGC GGAGGACAGC TGCCCGCCCT CGCCCGCCAC 4860CGAGAGGAGC TACTTCCATC TCTTCCCGCC CCCTCCGTCC CCCTGCACGG ACTCATCCTG 4920ACCTCGGCCG GGCCACTCTG GCTTCTCTGT GCCCCTGTAA ATAGTTTTAA ATATGAACAA 4980AGAAAAAAAT ATATTTTATG ATTTAAAAAA TAAATATAAT TGGGATTTTA AAAACATGAG 5040AAATGTGAAC TGTGATGGGG TGGGCAGGGC TGGGAGAACT TTGTACAGTG GAACAAATAT 5100TTATAAACTT AATTTTGTAA AACAG 5125 167 base pairs nucleic acid singlelinear 27 TAAAATGGCT TGGCAAAGGG AGTTCATTCC TTTTAGCGCT TCCATCTTCTGCAGTGAGAG 60 GACACCGCAT TCTTCTTCTC CAGAGGATGC AGCAGCAAGG CGCCATCTTGAAACCAGAGA 120 CCAAACCAAC CAGCAACTTC GTCTTGAACT TCCCAGCCTC CACAACT 1674915 base pairs nucleic acid single linear 28 ATGGCTTGGC AAAGGGAGTTCATTCCTTTT AGCGCTTCCA TCTTCTGCAG TGAGAGGACA 60 CCGCATTCTT CTTCTCCAGAGGATGCAGCA GCAAGGCGCC ATCTTGAAAC CAGAGACCAA 120 ACCAACCAGC AACTTCGTCTTGAACTTCCC AGCCTCCACA ACTCCTCGCC GCTCCTGCTA 180 TTTGCCAACC GCCGGGACGTACGGCTGGTG GACGCCGGCG GAGTCAAGCT GGAGTCCACC 240 ATCGTGGTCA GCGGCCTGGAGGATGCGGCC GCAGTGGACT TCCAGTTTTC CAAGGGAGCC 300 GTGTACTGGA CAGACGTGAGCGAGGAGGCC ATCAAGCAGA CCTACCTGAA CCAGACGGGG 360 GCCGCCGTGC AGAACGTGGTCATCTCCGGC CTGGTCTCTC CCGACGGCCT CGCCTGCGAC 420 TGGGTGGGCA AGAAGCTGTACTGGACGGAC TCAGAGACCA ACCGCATCGA GGTGGCCAAC 480 CTCAATGGCA CATCCCGGAAGGTGCTCTTC TGGCAGGACC TTGACCAGCC GAGGGCCATC 540 GCCTTGGACC CCGCTCACGGGTACATGTAC TGGACAGACT GGGGTGAGAC GCCCCGGATT 600 GAGCGGGCAG GGATGGATGGCAGCACCCGG AAGATCATTG TGGACTCGGA CATTTACTGG 660 CCCAATGGAC TGACCATCGACCTGGAGGAG CAGAAGCTCT ACTGGGCTGA CGCCAAGCTC 720 AGCTTCATCC ACCGTGCCAACCTGGACGGC TCGTTCCGGC AGAAGGTGGT GGAGGGCAGC 780 CTGACGCACC CCTTCGCCCTGACGCTCTCC GGGGACACTC TGTACTGGAC AGACTGGCAG 840 ACCCGCTCCA TCCATGCCTGCAACAAGCGC ACTGGGGGGA AGAGGAAGGA GATCCTGAGT 900 GCCCTCTACT CACCCATGGACATCCAGGTG CTGAGCCAGG AGCGGCAGCC TTTCTTCCAC 960 ACTCGCTGTG AGGAGGACAATGGCGGCTGC TCCCACCTGT GCCTGCTGTC CCCAAGCGAG 1020 CCTTTCTACA CATGCGCCTGCCCCACGGGT GTGCAGCTGC AGGACAACGG CAGGACGTGT 1080 AAGGCAGGAG CCGAGGAGGTGCTGCTGCTG GCCCGGCGGA CGGACCTACG GAGGATCTCG 1140 CTGGACACGC CGGACTTTACCGACATCGTG CTGCAGGTGG ACGACATCCG GCACGCCATT 1200 GCCATCGACT ACGACCCGCTAGAGGGCTAT GTCTACTGGA CAGATGACGA GGTGCGGGCC 1260 ATCCGCAGGG CGTACCTGGACGGGTCTGGG GCGCAGACGC TGGTCAACAC CGAGATCAAC 1320 GACCCCGATG GCATCGCGGTCGACTGGGTG GCCCGAAACC TCTACTGGAC CGACACGGGC 1380 ACGGACCGCA TCGAGGTGACGCGCCTCAAC GGCACCTCCC GCAAGATCCT GGTGTCGGAG 1440 GACCTGGACG AGCCCCGAGCCATCGCACTG CACCCCGTGA TGGGCCTCAT GTACTGGACA 1500 GACTGGGGAG AGAACCCTAAAATCGAGTGT GCCAACTTGG ATGGGCAGGA GCGGCGTGTG 1560 CTGGTCAATG CCTCCCTCGGGTGGCCCAAC GGCCTGGCCC TGGACCTGCA GGAGGGGAAG 1620 CTCTACTGGG GAGACGCCAAGACAGACAAG ATCGAGGTGA TCAATGTTGA TGGGACGAAG 1680 AGGCGGACCC TCCTGGAGGACAAGCTCCCG CACATTTTCG GGTTCACGCT GCTGGGGGAC 1740 TTCATCTACT GGACTGACTGGCAGCGCCGC AGCATCGAGC GGGTGCACAA GGTCAAGGCC 1800 AGCCGGGACG TCATCATTGACCAGCTGCCC GACCTGATGG GGCTCAAAGC TGTGAATGTG 1860 GCCAAGGTCG TCGGAACCAACCCGTGTGCG GACAGGAACG GGGGGTGCAG CCACCTGTGC 1920 TTCTTCACAC CCCACGCAACCCGGTGTGGC TGCCCCATCG GCCTGGAGCT GCTGAGTGAC 1980 ATGAAGACCT GCATCGTGCCTGAGGCCTTC TTGGTCTTCA CCAGCAGAGC CGCCATCCAC 2040 AGGATCTCCT CGAGACCAATAACAACGACG TGGCCATCCC GCTCACGGGC GTCAAGGAGG 2100 CCTCAGCCCT GGACTTTGATGTGTCCAACA ACCACATCTA CTGGACAGAC GTCAGCCTGA 2160 AGACCATCAG CCGCGCCTTCATGAACGGGA GCTCGGTGGA GCACGTGGTG GAGTTTGGCC 2220 TTGACTACCC CGAGGGCATGGCCGTTGACT GGATGGGCAA GAACCTCTAC TGGGCCGACA 2280 CTGGGACCAA CAGAATCGAAGTGGCGCGGC TGGACGGGCA GTTCCGGCAA GTCCTCGTGT 2340 GGAGGGACTT GGACAACCCGAGGTCGCTGG CCCTGGATCC CACCAAGGGC TACATCTACT 2400 GGACCGAGTG GGGCGGCAAGCCGAGGATCG TGCGGGCCTT CATGGACGGG ACCAACTGCA 2460 TGACGCTGGT GGACAAGGTGGGCCGGGCCA ACGACCTCAC CATTGACTAC GCTGACCAGC 2520 GCCTCTACTG GACCGACCTGGACACCAACA TGATCGAGTC GTCCAACATG CTGGGTCAGG 2580 AGCGGGTCGT GATTGCCGACGATCTCCCGC ACCCGTTCGG TCTGACGCAG TACAGCGATT 2640 ATATCTACTG GACAGACTGGAATCTGCACA GCATTGAGCG GGCCGACAAG ACTAGCGGCC 2700 GGAACCGCAC CCTCATCCAGGGCCACCTGG ACTTCGTGAT GGACATCCTG GTGTTCCACT 2760 CCTCCCGCCA GGATGGCCTCAATGACTGTA TGCACAACAA CGGGCAGTGT GGGCAGCTGT 2820 GCCTTGCCAT CCCCGGCGGCCACCGCTGCG GCTGCGCCTC ACACTACACC CTGGACCCCA 2880 GCAGCCGCAA CTGCAGCCCGCCCACCACCT TCTTGCTGTT CAGCCAGAAA TCTGCCATCA 2940 GTCGGATGAT CCCGGACGACCAGCACAGCC CGGATCTCAT CCTGCCCCTG CATGGACTGA 3000 GGAACGTCAA AGCCATCGACTATGACCCAC TGGACAAGTT CATCTACTGG GTGGATGGGC 3060 GCCAGAACAT CAAGCGAGCCAAGGACGACG GGACCCAGCC CTTTGTTTTG ACCTCTCTGA 3120 GCCAAGGCCA AAACCCAGACAGGCAGCCCC ACGACCTCAG CATCGACATC TACAGCCGGA 3180 CACTGTTCTG GACGTGCGAGGCCACCAATA CCATCAACGT CCACAGGCTG AGCGGGGAAG 3240 CCATGGGGGT GGTGCTGCGTGGGGACCGCG ACAAGCCCAG GGCCATCGTC GTCAACGCGG 3300 AGCGAGGGTA CCTGTACTTCACCAACATGC AGGACCGGGC AGCCAAGATC GAACGCGCAG 3360 CCCTGGACGG CACCGAGCGCGAGGTCCTCT TCACCACCGG CCTCATCCGC CCTGTGGCCC 3420 TGGTGGTAGA CAACACACTGGGCAAGCTGT TCTGGGTGGA CGCGGACCTG AAGCGCATTG 3480 AGAGCTGTGA CCTGTCAGGGGCCAACCGCC TGACCCTGGA GGACGCCAAC ATCGTGCAGC 3540 CTCTGGGCCT GACCATCCTTGGCAAGCATC TCTACTGGAT CGACCGCCAG CAGCAGATGA 3600 TCGAGCGTGT GGAGAAGACCACCGGGGACA AGCGGACTCG CATCCAGGGC CGTGTCGCCC 3660 ACCTCACTGG CATCCATGCAGTGGAGGAAG TCAGCCTGGA GGAGTTCTCA GCCCACCCAT 3720 GTGCCCGTGA CAATGGTGGCTGCTCCCACA TCTGTATTGC CAAGGGTGAT GGGACACCAC 3780 GGTGCTCATG CCCAGTCCACCTCGTGCTCC TGCAGAACCT GCTGACCTGT GGAGAGCCGC 3840 CCACCTGCTC CCCGGACCAGTTTGCATGTG CCACAGGGGA GATCGACTGT ATCCCCGGGG 3900 CCTGGCGCTG TGACGGCTTTCCCGAGTGCG ATGACCAGAG CGACGAGGAG GGCTGCCCCG 3960 TGTGCTCCGC CGCCCAGTTCCCCTGCGCGC GGGGTCAGTG TGTGGACCTG CGCCTGCGCT 4020 GCGACGGCGA GGCAGACTGTCAGGACCGCT CAGACGAGGC GGACTGTGAC GCCATCTGCC 4080 TGCCCAACCA GTTCCGGTGTGCGAGCGGCA GTGTGTCCTC ATCAAACAGC AGTGCGACTC 4140 CTTCCCCGAC TGTATCGACGGCTCCGACGA GCTCATGTGT GAAATCACCA AGCCGCCCTC 4200 AGACGACAGC CCGGCCCACAGCAGTGCCAT CGGGCCCGTC ATTGGCATCA TCCTCTCTCT 4260 CTTCGTCATG GGTGGTGTCTATTTTGTGTG CCAGCGCGTG GTGTGCCAGC GCTATGCGGG 4320 GGCCAACGGG CCCTTCCCGCACGAGTATGT CAGCGGGACC CCGCACGTGC CCCTCAATTT 4380 CATAGCCCCG GGCGGTTCCCAGCATGGCCC CTTCACAGGC ATCGCATGCG GAAAGTCCAT 4440 GATGAGCTCC GTGAGCCTGATGGGGGGCCG GGGCGGGGTG CCCCTCTACG ACCGGAACCA 4500 CGTCACAGGG GCCTCGTCCAGCAGCTCGTC CAGCACGAAG GCCACGCTGT ACCCGCCGAT 4560 CCTGAACCCG CCGCCCTCCCCGGCCACGGA CCCCTCCCTG TACAACATGG ACATGTTCTA 4620 CTCTTCAAAC ATTCCGGCCACTGCGAGACC GTACAGGCCC TACATCATTC GAGGAATGGC 4680 GCCCCCGACG ACGCCCTGCAGCACCGACGT GTGTGACAGC GACTACAGCG CCAGCCGCTG 4740 GAAGGCCAGC AAGTACTACCTGGATTTGAA CTCGGACTCA GACCCCTATC CACCCCCACC 4800 CACGCCCCAC AGCCAGTACCTGTCGGCGGA GGACAGCTGC CCGCCCTCGC CCGCCACCGA 4860 GAGGAGCTAC TTCCATCTCTTCCCGCCCCC TCCGTCCCCC TGCACGGACT CATCC 4915 1639 amino acids amino acidlinear 29 Met Ala Trp Gln Arg Glu Phe Ile Pro Phe Ser Ala Ser Ile PheCys 1 5 10 15 Ser Glu Arg Thr Pro His Ser Ser Ser Pro Glu Asp Ala AlaAla Arg 20 25 30 Arg His Leu Glu Thr Arg Asp Gln Thr Asn Gln Gln Leu ArgLeu Glu 35 40 45 Leu Pro Ser Leu His Asn Ser Ser Pro Leu Leu Leu Phe AlaAsn Arg 50 55 60 Arg Asp Val Arg Leu Val Asp Ala Gly Gly Val Lys Leu GluSer Thr 65 70 75 80 Ile Val Val Ser Gly Leu Glu Asp Ala Ala Ala Val AspPhe Gln Phe 85 90 95 Ser Lys Gly Ala Val Tyr Trp Thr Asp Val Ser Glu GluAla Ile Lys 100 105 110 Gln Thr Tyr Leu Asn Gln Thr Gly Ala Ala Val GlnAsn Val Val Ile 115 120 125 Ser Gly Leu Val Ser Pro Asp Gly Leu Ala CysAsp Trp Val Gly Lys 130 135 140 Lys Leu Tyr Trp Thr Asp Ser Glu Thr AsnArg Ile Glu Val Ala Asn 145 150 155 160 Leu Asn Gly Thr Ser Arg Lys ValLeu Phe Trp Gln Asp Leu Asp Gln 165 170 175 Pro Arg Ala Ile Ala Leu AspPro Ala His Gly Tyr Met Tyr Trp Thr 180 185 190 Asp Trp Gly Glu Thr ProArg Ile Glu Arg Ala Gly Met Asp Gly Ser 195 200 205 Thr Arg Lys Ile IleVal Asp Ser Asp Ile Tyr Trp Pro Asn Gly Leu 210 215 220 Thr Ile Asp LeuGlu Glu Gln Lys Leu Tyr Trp Ala Asp Ala Lys Leu 225 230 235 240 Ser PheIle His Arg Ala Asn Leu Asp Gly Ser Phe Arg Gln Lys Val 245 250 255 ValGlu Gly Ser Leu Thr His Pro Phe Ala Leu Thr Leu Ser Gly Asp 260 265 270Thr Leu Tyr Trp Thr Asp Trp Gln Thr Arg Ser Ile His Ala Cys Asn 275 280285 Lys Arg Thr Gly Gly Lys Arg Lys Glu Ile Leu Ser Ala Leu Tyr Ser 290295 300 Pro Met Asp Ile Gln Val Leu Ser Gln Glu Arg Gln Pro Phe Phe His305 310 315 320 Thr Arg Cys Glu Glu Asp Asn Gly Gly Cys Ser His Leu CysLeu Leu 325 330 335 Ser Pro Ser Glu Pro Phe Tyr Thr Cys Ala Cys Pro ThrGly Val Gln 340 345 350 Leu Gln Asp Asn Gly Arg Thr Cys Lys Ala Gly AlaGlu Glu Val Leu 355 360 365 Leu Leu Ala Arg Arg Thr Asp Leu Arg Arg IleSer Leu Asp Thr Pro 370 375 380 Asp Phe Thr Asp Ile Val Leu Gln Val AspAsp Ile Arg His Ala Ile 385 390 395 400 Ala Ile Asp Tyr Asp Pro Leu GluGly Tyr Val Tyr Trp Thr Asp Asp 405 410 415 Glu Val Arg Ala Ile Arg ArgAla Tyr Leu Asp Gly Ser Gly Ala Gln 420 425 430 Thr Leu Val Asn Thr GluIle Asn Asp Pro Asp Gly Ile Ala Val Asp 435 440 445 Trp Val Ala Arg AsnLeu Tyr Trp Thr Asp Thr Gly Thr Asp Arg Ile 450 455 460 Glu Val Thr ArgLeu Asn Gly Thr Ser Arg Lys Ile Leu Val Ser Glu 465 470 475 480 Asp LeuAsp Glu Pro Arg Ala Ile Ala Leu His Pro Val Met Gly Leu 485 490 495 MetTyr Trp Thr Asp Trp Gly Glu Asn Pro Lys Ile Glu Cys Ala Asn 500 505 510Leu Asp Gly Gln Glu Arg Arg Val Leu Val Asn Ala Ser Leu Gly Trp 515 520525 Pro Asn Gly Leu Ala Leu Asp Leu Gln Glu Gly Lys Leu Tyr Trp Gly 530535 540 Asp Ala Lys Thr Asp Lys Ile Glu Val Ile Asn Val Asp Gly Thr Lys545 550 555 560 Arg Arg Thr Leu Leu Glu Asp Lys Leu Pro His Ile Phe GlyPhe Thr 565 570 575 Leu Leu Gly Asp Phe Ile Tyr Trp Thr Asp Trp Gln ArgArg Ser Ile 580 585 590 Glu Arg Val His Lys Val Lys Ala Ser Arg Asp ValIle Ile Asp Gln 595 600 605 Leu Pro Asp Leu Met Gly Leu Lys Ala Val AsnVal Ala Lys Val Val 610 615 620 Gly Thr Asn Pro Cys Ala Asp Arg Asn GlyGly Cys Ser His Leu Cys 625 630 635 640 Phe Phe Thr Pro His Ala Thr ArgCys Gly Cys Pro Ile Gly Leu Glu 645 650 655 Leu Leu Ser Asp Met Lys ThrCys Ile Val Pro Glu Ala Phe Leu Val 660 665 670 Phe Thr Ser Arg Ala AlaIle His Arg Ile Ser Leu Glu Thr Asn Asn 675 680 685 Asn Asp Val Ala IlePro Leu Thr Gly Val Lys Glu Ala Ser Ala Leu 690 695 700 Asp Phe Asp ValSer Asn Asn His Ile Tyr Trp Thr Asp Val Ser Leu 705 710 715 720 Lys ThrIle Ser Arg Ala Phe Met Asn Gly Ser Ser Val Glu His Val 725 730 735 ValGlu Phe Gly Leu Asp Tyr Pro Glu Gly Met Ala Val Asp Trp Met 740 745 750Gly Lys Asn Leu Tyr Trp Ala Asp Thr Gly Thr Asn Arg Ile Glu Val 755 760765 Ala Arg Leu Asp Gly Gln Phe Arg Gln Val Leu Val Trp Arg Asp Leu 770775 780 Asp Asn Pro Arg Ser Leu Ala Leu Asp Pro Thr Lys Gly Tyr Ile Tyr785 790 795 800 Trp Thr Glu Trp Gly Gly Lys Pro Arg Ile Val Arg Ala PheMet Asp 805 810 815 Gly Thr Asn Cys Met Thr Leu Val Asp Lys Val Gly ArgAla Asn Asp 820 825 830 Leu Thr Ile Asp Tyr Ala Asp Gln Arg Leu Tyr TrpThr Asp Leu Asp 835 840 845 Thr Asn Met Ile Glu Ser Ser Asn Met Leu GlyGln Glu Arg Val Val 850 855 860 Ile Ala Asp Asp Leu Pro His Pro Phe GlyLeu Thr Gln Tyr Ser Asp 865 870 875 880 Tyr Ile Tyr Trp Thr Asp Trp AsnLeu His Ser Ile Glu Arg Ala Asp 885 890 895 Lys Thr Ser Gly Arg Asn ArgThr Leu Ile Gln Gly His Leu Asp Phe 900 905 910 Val Met Asp Ile Leu ValPhe His Ser Ser Arg Gln Asp Gly Leu Asn 915 920 925 Asp Cys Met His AsnAsn Gly Gln Cys Gly Gln Leu Cys Leu Ala Ile 930 935 940 Pro Gly Gly HisArg Cys Gly Cys Ala Ser His Tyr Thr Leu Asp Pro 945 950 955 960 Ser SerArg Asn Cys Ser Pro Pro Thr Thr Phe Leu Leu Phe Ser Gln 965 970 975 LysSer Ala Ile Ser Arg Met Ile Pro Asp Asp Gln His Ser Pro Asp 980 985 990Leu Ile Leu Pro Leu His Gly Leu Arg Asn Val Lys Ala Ile Asp Tyr 995 10001005 Asp Pro Leu Asp Lys Phe Ile Tyr Trp Val Asp Gly Arg Gln Asn Ile1010 1015 1020 Lys Arg Ala Lys Asp Asp Gly Thr Gln Pro Phe Val Leu ThrSer Leu 1025 1030 1035 1040 Ser Gln Gly Gln Asn Pro Asp Arg Gln Pro HisAsp Leu Ser Ile Asp 1045 1050 1055 Ile Tyr Ser Arg Thr Leu Phe Trp ThrCys Glu Ala Thr Asn Thr Ile 1060 1065 1070 Asn Val His Arg Leu Ser GlyGlu Ala Met Gly Val Val Leu Arg Gly 1075 1080 1085 Asp Arg Asp Lys ProArg Ala Ile Val Val Asn Ala Glu Arg Gly Tyr 1090 1095 1100 Leu Tyr PheThr Asn Met Gln Asp Arg Ala Ala Lys Ile Glu Arg Ala 1105 1110 1115 1120Ala Leu Asp Gly Thr Glu Arg Glu Val Leu Phe Thr Thr Gly Leu Ile 11251130 1135 Arg Pro Val Ala Leu Val Val Asp Asn Thr Leu Gly Lys Leu PheTrp 1140 1145 1150 Val Asp Ala Asp Leu Lys Arg Ile Glu Ser Cys Asp LeuSer Gly Ala 1155 1160 1165 Asn Arg Leu Thr Leu Glu Asp Ala Asn Ile ValGln Pro Leu Gly Leu 1170 1175 1180 Thr Ile Leu Gly Lys His Leu Tyr TrpIle Asp Arg Gln Gln Gln Met 1185 1190 1195 1200 Ile Glu Arg Val Glu LysThr Thr Gly Asp Lys Arg Thr Arg Ile Gln 1205 1210 1215 Gly Arg Val AlaHis Leu Thr Gly Ile His Ala Val Glu Glu Val Ser 1220 1225 1230 Leu GluGlu Phe Ser Ala His Pro Cys Ala Arg Asp Asn Gly Gly Cys 1235 1240 1245Ser His Ile Cys Ile Ala Lys Gly Asp Gly Thr Pro Arg Cys Ser Cys 12501255 1260 Pro Val His Leu Val Leu Leu Gln Asn Leu Leu Thr Cys Gly GluPro 1265 1270 1275 1280 Pro Thr Cys Ser Pro Asp Gln Phe Ala Cys Ala ThrGly Glu Ile Asp 1285 1290 1295 Cys Ile Pro Gly Ala Trp Arg Cys Asp GlyPhe Pro Glu Cys Asp Asp 1300 1305 1310 Gln Ser Asp Glu Glu Gly Cys ProVal Cys Ser Ala Ala Gln Phe Pro 1315 1320 1325 Cys Ala Arg Gly Gln CysVal Asp Leu Arg Leu Arg Cys Asp Gly Glu 1330 1335 1340 Ala Asp Cys GlnAsp Arg Ser Asp Glu Ala Asp Cys Asp Ala Ile Cys 1345 1350 1355 1360 LeuPro Asn Gln Phe Arg Cys Ala Ser Gly Gln Cys Val Leu Ile Lys 1365 13701375 Gln Gln Cys Asp Ser Phe Pro Asp Cys Ile Asp Gly Ser Asp Glu Leu1380 1385 1390 Met Cys Glu Ile Thr Lys Pro Pro Ser Asp Asp Ser Pro AlaHis Ser 1395 1400 1405 Ser Ala Ile Gly Pro Val Ile Gly Ile Ile Leu SerLeu Phe Val Met 1410 1415 1420 Gly Gly Val Tyr Phe Val Cys Gln Arg ValVal Cys Gln Arg Tyr Ala 1425 1430 1435 1440 Gly Ala Asn Gly Pro Phe ProHis Glu Tyr Val Ser Gly Thr Pro His 1445 1450 1455 Val Pro Leu Asn PheIle Ala Pro Gly Gly Ser Gln His Gly Pro Phe 1460 1465 1470 Thr Gly IleAla Cys Gly Lys Ser Met Met Ser Ser Val Ser Leu Met 1475 1480 1485 GlyGly Arg Gly Gly Val Pro Leu Tyr Asp Arg Asn His Val Thr Gly 1490 14951500 Ala Ser Ser Ser Ser Ser Ser Ser Thr Lys Ala Thr Leu Tyr Pro Pro1505 1510 1515 1520 Ile Leu Asn Pro Pro Pro Ser Pro Ala Thr Asp Pro SerLeu Tyr Asn 1525 1530 1535 Met Asp Met Phe Tyr Ser Ser Asn Ile Pro AlaThr Ala Arg Pro Tyr 1540 1545 1550 Arg Pro Tyr Ile Ile Arg Gly Met AlaPro Pro Thr Thr Pro Cys Ser 1555 1560 1565 Thr Asp Val Cys Asp Ser AspTyr Ser Ala Ser Arg Trp Lys Ala Ser 1570 1575 1580 Lys Tyr Tyr Leu AspLeu Asn Ser Asp Ser Asp Pro Tyr Pro Pro Pro 1585 1590 1595 1600 Pro ThrPro His Ser Gln Tyr Leu Ser Ala Glu Asp Ser Cys Pro Pro 1605 1610 1615Ser Pro Ala Thr Glu Arg Ser Tyr Phe His Leu Phe Pro Pro Pro Pro 16201625 1630 Ser Pro Cys Thr Asp Ser Ser 1635 91 base pairs nucleic acidsingle linear 30 TATAAAATGG CTTGGCAAAG GGAGTTCATT CCTTTTAGCG CTTCCATCTTCTGCAGTGAG 60 AGGACACCGC ATTCTTCTTC TCCAGAGGAT G 91 5263 base pairsnucleic acid single linear 31 TAAGAGTATA AAGGGCTCCT GAGACCAAAAAGGTTGAGAA CCAGTGCTTT AAAGCTTGAT 60 GTTTCTCAGG GTTTCATCCT TTGTGGATTAATGCCCATTA TAAAATGGCT TGGCAAAGGG 120 AGTTCATTCC TTTTAGCGCT TCCATCTTCTGCAGTGAGAG GACACCGCAT TCTTCTTCTC 180 CAGAGGATGC AGCAGCAAGG CGCCATCTTGAAACCAGAGA CCAAACCAAC CAGCAACTTC 240 GTCTTGAACT TCCCAGCCTC CACAACTCAGCAGTCTGTGC AGGACCCTGT GAGCAGAGCC 300 GCAGCCTCGC CGCTCCTGCT ATTTGCCAACCGCCGGGACG TACGGCTGGT GGACGCCGGC 360 GGAGTCAAGC TGGAGTCCAC CATCGTGGTCAGCGGCCTGG AGGATGCGGC CGCAGTGGAC 420 TTCCAGTTTT CCAAGGGAGC CGTGTACTGGACAGACGTGA GCGAGGAGGC CATCAAGCAG 480 ACCTACCTGA ACCAGACGGG GGCCGCCGTGCAGAACGTGG TCATCTCCGG CCTGGTCTCT 540 CCCGACGGCC TCGCCTGCGA CTGGGTGGGCAAGAAGCTGT ACTGGACGGA CTCAGAGACC 600 AACCGCATCG AGGTGGCCAA CCTCAATGGCACATCCCGGA AGGTGCTCTT CTGGCAGGAC 660 CTTGACCAGC CGAGGGCCAT CGCCTTGGACCCCGCTCACG GGTACATGTA CTGGACAGAC 720 TGGGGTGAGA CGCCCCGGAT TGAGCGGGCAGGGATGGATG GCAGCACCCG GAAGATCATT 780 GTGGACTCGG ACATTTACTG GCCCAATGGACTGACCATCG ACCTGGAGGA GCAGAAGCTC 840 TACTGGGCTG ACGCCAAGCT CAGCTTCATCCACCGTGCCA ACCTGGACGG CTCGTTCCGG 900 CAGAAGGTGG TGGAGGGCAG CCTGACGCACCCCTTCGCCC TGACGCTCTC CGGGGACACT 960 CTGTACTGGA CAGACTGGCA GACCCGCTCCATCCATGCCT GCAACAAGCG CACTGGGGGG 1020 AAGAGGAAGG AGATCCTGAG TGCCCTCTACTCACCCATGG ACATCCAGGT GCTGAGCCAG 1080 GAGCGGCAGC CTTTCTTCCA CACTCGCTGTGAGGAGGACA ATGGCGGCTG CTCCCACCTG 1140 TGCCTGCTGT CCCCAAGCGA GCCTTTCTACACATGCGCCT GCCCCACGGG TGTGCAGCTG 1200 CAGGACAACG GCAGGACGTG TAAGGCAGGAGCCGAGGAGG TGCTGCTGCT GGCCCGGCGG 1260 ACGGACCTAC GGAGGATCTC GCTGGACACGCCGGACTTTA CCGACATCGT GCTGCAGGTG 1320 GACGACATCC GGCACGCCAT TGCCATCGACTACGACCCGC TAGAGGGCTA TGTCTACTGG 1380 ACAGATGACG AGGTGCGGGC CATCCGCAGGGCGTACCTGG ACGGGTCTGG GGCGCAGACG 1440 CTGGTCAACA CCGAGATCAA CGACCCCGATGGCATCGCGG TCGACTGGGT GGCCCGAAAC 1500 CTCTACTGGA CCGACACGGG CACGGACCGCATCGAGGTGA CGCGCCTCAA CGGCACCTCC 1560 CGCAAGATCC TGGTGTCGGA GGACCTGGACGAGCCCCGAG CCATCGCACT GCACCCCGTG 1620 ATGGGCCTCA TGTACTGGAC AGACTGGGGAGAGAACCCTA AAATCGAGTG TGCCAACTTG 1680 GATGGGCAGG AGCGGCGTGT GCTGGTCAATGCCTCCCTCG GGTGGCCCAA CGGCCTGGCC 1740 CTGGACCTGC AGGAGGGGAA GCTCTACTGGGGAGACGCCA AGACAGACAA GATCGAGGTG 1800 ATCAATGTTG ATGGGACGAA GAGGCGGACCCTCCTGGAGG ACAAGCTCCC GCACATTTTC 1860 GGGTTCACGC TGCTGGGGGA CTTCATCTACTGGACTGACT GGCAGCGCCG CAGCATCGAG 1920 CGGGTGCACA AGGTCAAGGC CAGCCGGGACGTCATCATTG ACCAGCTGCC CGACCTGATG 1980 GGGCTCAAAG CTGTGAATGT GGCCAAGGTCGTCGGAACCA ACCCGTGTGC GGACAGGAAC 2040 GGGGGGTGAG CCACCTGTGC TTCTTCACACCCCACGCAAC CCGGTGTGGC TGCCCCATCG 2100 GCCTGGAGCT GCTGAGTGAC ATGAAGACCTGCATCGTGCC TGAGGCCTTC TTGGTCTTCA 2160 CCAGCAGAGC CGCCATCCAC AGGATCTCCCTCGAGACCAA TAACAACGAC GTGGCCATCC 2220 CGCTCACGGG CGTCAAGGAG GCCTCAGCCCTGGACTTTGA TGTGTCCAAC AACCACATCT 2280 ACTGGACAGA CGTCAGCCTG AAGACCATCAGCCGCGCCTT CATGAACGGG AGCTCGGTGG 2340 AGCACGTGGT GGAGTTTGGC CTTGACTACCCCGAGGGCAT GGCCGTTGAC TGGATGGGCA 2400 AGAACCTCTA CTGGGCCGAC ACTGGGACCAACAGAATCGA AGTGGCGCGG CTGGACGGGC 2460 AGTTCCGGCA AGTCCTCGTG TGGAGGGACTTGGACAACCC GAGGTCGCTG GCCCTGGATC 2520 CCACCAAGGG CTACATCTAC TGGACCGAGTGGGGCGGCAA GCCGAGGATC GTGCGGGCCT 2580 TCATGGACGG GACCAACTGC ATGACGCTGGTGGACAAGGT GGGCCGGGCC AACGACCTCA 2640 CCATTGACTA CGCTGACCAG CGCCTCTACTGGACCGACCT GGACACCAAC ATGATCGAGT 2700 CGTCCAACAT GCTGGGTCAG GAGCGGGTCGTGATTGCCGA CGATCTCCCG CACCCGTTCG 2760 GTCTGACGCA GTACAGCGAT TATATCTACTGGACAGACTG GAATCTGCAC AGCATTGAGC 2820 GGGCCGACAA GACTAGCGGC CGGAACCGCACCCTCATCCA GGGCCACCTG GACTTCGTGA 2880 TGGACATCCT GGTGTTCCAC TCCTCCCGCCAGGATGGCCT CAATGACTGT ATGCACAACA 2940 ACGGGCAGTG TGGGCAGCTG TGCCTTGCCATCCCCGGCGG CCACCGCTGC GGCTGCGCCT 3000 CACACTACAC CCTGGACCCC AGCAGCCGCAACTGCAGCCC GCCCACCACC TTCTTGCTGT 3060 TCAGCCAGAA ATCTGCCATC AGTCGGATGATCCCGGACGA CCAGCACAGC CCGGATCTCA 3120 TCCTGCCCCT GCATGGACTG AGGAACGTCAAAGCCATCGA CTATGACCCA CTGGACAAGT 3180 TCATCTACTG GGTGGATGGG CGCCAGAACATCAAGCGAGC CAAGGACGAC GGGACCCAGC 3240 CCTTTGTTTT GACCTCTCTG AGCCAAGGCCAAAACCCAGA CAGGCAGCCC CACGACCTCA 3300 GCATCGACAT CTACAGCCGG ACACTGTTCTGGACGTGCGA GGCCACCAAT ACCATCAACG 3360 TCCACAGGCT GAGCGGGGAA GCCATGGGGGTGGTGCTGCG TGGGGACCGC GACAAGCCCA 3420 GGGCCATCGT CGTCAACGCG GAGCGAGGGTACCTGTACTT CACCAACATG CAGGACCGGG 3480 CAGCCAAGAT CGAACGCGCA GCCCTGGACGGCACCGAGCG CGAGGTCCTC TTCACCACCG 3540 GCCTCATCCG CCCTGTGGCC CTGGTGGTAGACAACACACT GGGCAAGCTG TTCTGGGTGG 3600 ACGCGGACCT GAAGCGCATT GAGAGCTGTGACCTGTCAGG GGCCAACCGC CTGACCCTGG 3660 AGGACGCCAA CATCGTGCAG CCTCTGGGCCTGACCATCCT TGGCAAGCAT CTCTACTGGA 3720 TCGACCGCCA GCAGCAGATG ATCGAGCGTGTGGAGAAGAC CACCGGGGAC AAGCGGACTC 3780 GCATCCAGGG CCGTGTCGCC CACCTCACTGGCATCCATGC AGTGGAGGAA GTCAGCCTGG 3840 AGGAGTTCTC AGCCCACCCA TGTGCCCGTGACAATGGTGG CTGCTCCCAC ATCTGTATTG 3900 CCAAGGGTGA TGGGACACCA CGGTGCTCATGCCCAGTCCA CCTCGTGCTC CTGCAGAACC 3960 TGCTGACCTG TGGAGAGCCG CCCACCTGCTCCCCGGACCA GTTTGCATGT GCCACAGGGG 4020 AGATCGACTG TATCCCCGGG GCCTGGCGCTGTGACGGCTT TCCCGAGTGC GATGACCAGA 4080 GCGACGAGGA GGGCTGCCCC GTGGCTCCGCCGCCCAGTTC CCCTGCGCGC GGGGTCAGTG 4140 TGTGGACCTG CGCCTGCGCT GCGACGGCGAGGCAGACTGT CAGGACCGCT CAGACGAGGC 4200 GGACTGTGAC GCCATCTGCC TGCCCAACCAGTTCCGGTGT GCGAGCGGCC AGTGTGTCCT 4260 CATCAAACAG CAGTGCGACT CCTTCCCCGACTGTATCGAC GGCTCCGACG AGCTCATGTG 4320 TGAAATCACC AAGCCGCCCT CAGACGACAGCCCGGCCCAC AGCAGTGCCA TCGGGCCCGT 4380 CATTGGCATC ATCCTCTCTC TCTTCGTCATGGGTGGTGTC TATTTTGTGT GCCAGCGCGT 4440 GGTGTGCCAG CGCTATGCGG GGGCCAACGGGCCCTTCCCG CACGAGTATG TCAGCGGGAC 4500 CCCGCACGTG CCCCTCAATT TCATAGCCCCGGGCGGTTCC CAGCATGGCC CCTTCACAGG 4560 CATCGCATGC GGAAAGTCCA TGATGAGCTCCGTGAGCCTG ATGGGGGGCC GGGGCGGGGT 4620 GCCCCTCTAC GACCGGAACC ACGTCACAGGGGCCTCGTCC AGCAGCTCGT CCAGCACGAA 4680 GGCCACGCTG TACCCGCGGA TCCTGAACCCGCCGCCCTCC CCGGCCACGG ACCCCTCCCT 4740 GTACAACATG GACATGTTCT ACTCTTCAAACATTCCGGCC ACTGCGAGAC CGTACAGGCC 4800 CTACATCATT CGAGGAATGG CGCCCCCGACGACGCCCTGC AGCACCGACG TGTGTGACAG 4860 CGACTACAGC GCCAGCCGCT GGAAGGCCAGCAAGTACTAC CTGGATTTGA ACTCGGACTC 4920 AGACCCCTAT CCACCCCCAC CCACGCCCCACAGCCAGTAC CTGTCGGCGG AGGACAGCTG 4980 CCCGCCCTCG CCCGCCACCG AGAGGAGCTACTTCCATCTC TTCCCGCCCC CTCCGTCCCC 5040 CTGCACGGAC TCATCCTGAC CTCGGCCGGGCCACTCTGGC TTCTCTGTGC CCCTGTAAAT 5100 AGTTTTAAAT ATGAACAAAG AAAAAAATATATTTTATGAT TTAAAAAATA AATATAATTG 5160 GGATTTTAAA AACATGAGAA ATGTGAACTGTGATGGGGTG GGCAGGGCTG GGAGAACTTT 5220 GTACAGTGGA ACAAATATTT ATAAACTTAATTTTGTAAAA CAG 5263 5022 base pairs nucleic acid single linear 32GGCTGGTCTT GAACTCCTGG CCTGAGATGA TCCTCTCTCC TCGGAAAGTG CTGGGATTAT 60AGCCTCGCCG CTCCTGCTAT TTGCCAACCG CCGGGACGTA CGGCTGGTGG ACGCCGGCGG 120AGTCAAGCTG GAGTCCACCA TCGTGGTCAG CGGCCTGGAG GATGCGGCCG CAGTGGACTT 180CCAGTTTTCC AAGGGAGCCG TGTACTGGAC AGACGGAGCG AGGAGGCCAT CAAGCAGACC 240TACCTGAACC AGACGGGGGC CGCCGTGCAG AACGTGGTCA TCTCCGGCCT GGTCTCTCCC 300GACGGCCTCG CCTGCGACTG GGTGGGCAAG AAGCTGTACT GGACGGACTC AGAGACCAAC 360CGCATCGAGG TGGCCAACCT CAATGGCACA TCCCGGAAGG TGCTCTTCTG GCAGGACCTT 420GACCAGCCGA GGGCCATCGC CTTGGACCCC GCTCACGGGT ACATGTACTG GACAGACTGG 480GGTGAGACGC CCCGGATTGA GCGGGCAGGG ATGGATGGCA GCACCCGGAA GATCATTGTG 540GACTCGGACA TTTACTGGCC CAATGGACTG ACCATCGACC TGGAGGAGCA GAAGCTCTAC 600TGGGCTGACG CCAAGCTCAG CTTCATCCAC CGTGCCAACC TGGACGGCTC GTTCCGGCAG 660AAGGTGGTGG AGGGCAGCCT GACGCACCCC TTCGCCCTGA CGCTCTCCGG GGACACTCTG 720TACTGGACAG ACTGGCAGAC CCGCTCCATC CATGCCTGCA ACAAGCGCAC TGGGGGGAAG 780AGGAAGGAGA TCCTGAGTGC CCTCTACTCA CCCATGGACA TCCAGGTGCT GAGCCAGGAG 840CGGCAGCCTT TCTTCCACAC TCGCTGTGAG GAGGACAATG GCGGCTGCTC CCACCTGTGC 900CTGCTGTCCC CAAGCGAGCC TTTCTACACA TGCGCCTGCC CCACGGGTGT GCAGCTGCAG 960GACAACGGCA GGACGTGTAA GGCAGGAGCC GAGGAGGTGC TGCTGCTGGC CCGGCGGACG 1020GACCTACGGA GGATCTCGCT GGACACGCCG GACTTTACCG ACATCGTGCT GCAGGTGGAC 1080GACATCCGGC ACGCCATTGC CATCGACTAC GACCCGCTAG AGGGCTATGT CTACTGGACA 1140GATGACGAGG TGCGGGCCAT CCGCAGGGCG TACCTGGACG GGTCTGGGGC GCAGACGCTG 1200GTCAACACCG AGATCAACGA CCCCGATGGC ATCGCGGTCG ACTGGGTGGC CCGAAACCTC 1260TACTGGACCG ACACGGGCAC GGACCGCATC GAGGTGACGC GCCTCAACGG CACCTCCCGC 1320AAGATCCTGG TGTCGGAGGA CCTGGACGAG CCCCGAGCCA TCGCACTGCA CCCCGTGATG 1380GGCCTCATGT ACTGGACAGA CTGGGGAGAG AACCCTAAAA TCGAGTGTGC CAACTTGGAT 1440GGGCAGGAGC GGCGTGTGCT GGTCAATGCC TCCCTCGGGT GGCCCAACGG CCTGGCCCTG 1500GACCTGCAGG AGGGGAAGCT CTACTGGGGA GACGCCAAGA CAGACAAGAT CGAGGTGATC 1560AATGTTGATG GGACGAAGAG GCGGACCCTC CTGGAGGACA AGCTCCCGCA CATTTTCGGG 1620TTCACGCTGC TGGGGGACTT CATCTACTGG ACTGACTGGC AGCGCCGCAG CATCGAGCGG 1680GTGCACAAGG TCAAGGCCAG CCGGGACGTC ATCATTGACC AGCTGCCCGA CCTGATGGGG 1740CTCAAAGCTG TGAATGTGGC CAAGGTCGTC GGAACCAACC CGTGTGCGGA CAGGAACGGG 1800GGGTGCAGCC ACCTGTGCTT CTTCACACCC CACGCAACCC GGTGTGGCTG CCCCATCGGC 1860CTGGAGCTGC TGAGTGACAT GAAGACCTGC ATCGTGCCTG AGGCCTTCTT GGTCTTCACC 1920AGCAGAGCCG CCATCCACAG GATCTCCCTC GAGACCAATA ACAACGACGT GGCCATCCCG 1980CTCACGGGCG TCAAGGAGGC CTCAGCCCTG GACTTTGATG TGTCCAACAA CCACATCTAC 2040TGGACAGACG TCAGCCTGAA GACCATCAGC CGCGCCTTCA TGAACGGGAG CTCGGTGGAG 2100CACGTGGTGG AGTTTGGCCT TGACTACCCC GAGGGCATGG CCGTTGACTG GATGGGCAAG 2160AACCTCTACT GGGCCGACAC TGGGACCAAC AGAATCGAAG TGGCGCGGCT GGACGGGCAG 2220TTCCGGCAAG TCCTCGTGTG GAGGGACTTG GACAACCCGA GGTCGCTGGC CCTGGATCCC 2280ACCAAGGGCT ACATCTACTG GACCGAGTGG GGCGGCAAGC CGAGGATCGT GCGGGCCTTC 2340ATGGACGGGA CCAACTGCAT GACGCTGGTG GACAAGGTGG GCCGGGCCAA CGACCTCACC 2400ATTGACTACG CTGACCAGCG CCTCTACTGG ACCGACCTGG ACACCAACAT GATCGAGTCG 2460TCCAACATGC TGGGTCAGGA GCGGGTCGTG ATTGCCGACG ATCTCCCGCA CCCGTTCGGT 2520CTGACGCAGT ACAGCGATTA TATCTACTGG ACAGACTGGA ATCTGCACAG CATTGAGCGG 2580GCCGACAAGA CTAGCGGCCG GAACCGCACC CTCATCCAGG GCCACCTGGA CTTCGTGATG 2640GACATCCTGG TGTTCCACTC CTCCCGCCAG GATGGCCTCA ATGACTGTAT GCACAACAAC 2700GGGCAGTGTG GGCAGCTGTG CCTTGCCATC CCCGGCGGCC ACCGCTGCGG CTGCGCCTCA 2760CACTACACCC TGGACCCCAG CAGCCGCAAC TGCAGCCCGC CCACCACCTT CTTGCTGTTC 2820AGCCAGAAAT CTGCCATCAG TCGGATGATC CCGGACGACC AGCACAGCCC GGATCTCATC 2880CTGCCCCTGC ATGGACTGAG GAACGTCAAA GCCATCGACT ATGACCCACT GGACAAGTTC 2940ATCTACTGGG TGGATGGGCG CCAGAACATC AAGCGAGCCA AGGACGACGG GACCCAGCCC 3000TTTGTTTTGA CCTCTCTGAG CCAAGGCCAA AACCCAGACA GGCAGCCCCA CGACCTCAGC 3060ATCGACATCT ACAGCCGGAC ACTGTTCTGG ACGTGCGAGG CCACCAATAC CATCAACGTC 3120CACAGGCTGA GCGGGGAAGC CATGGGGGTG GTGCTGCGTG GGGACCGCGA CAAGCCCAGG 3180GCCATCGTCG TCAACGCGGA GCGAGGGTAC CTGTACTTCA CCAACATGCA GGACCGGGCA 3240GCCAAGATCG AACGCGCAGC CCTGGACGGC ACCGAGCGCG AGGTCCTCTT CACCACCGGC 3300CTCATCCGCC CTGTGGCCCT GGTGGTAGAC AACACACTGG GCAAGCTGTT CTGGGTGGAC 3360GCGGACCTGA AGCGCATTGA GAGCTGTGAC CTGTCAGGGG CCAACCGCCT GACCCTGGAG 3420GACGCCAACA TCGTGCAGCC TCTGGGCCTG ACCATCCTTG GCAAGCATCT CTACTGGATC 3480GACCGCCAGC AGCAGATGAT CGAGCGTGTG GAGAAGACCA CCGGGGACAA GCGGACTCGC 3540ATCCAGGGCC GTGTCGCCCA CCTCACTGGC ATCCATGCAG TGGAGGAAGT CAGCCTGGAG 3600GAGTTCTCAG CCCACCCATG TGCCCGTGAC AATGGTGGCT GCTCCCACAT CTGTATTGCC 3660AAGGGTGATG GGACACCACG GTGCTCATGC CCAGTCCACC TCGTGCTCCT GCAGAACCTG 3720CTGACCTGTG GAGAGCCGCC CACCTGCTCC CCGGACCAGT TTGCATGTGC CACAGGGGAG 3780ATCGACTGTA TCCCCGGGGC CTGGCGCTGT GACGGCTTTC CCGAGTGCGA TGACCAGAGC 3840GACGAGGAGG GCTGCCCCGT GTGCTCCGCC GCCCAGTTCC CCTGCGCGCG GGGTCAGTGT 3900GTGGACCTGC GCCTGCGCTG CGACGGCGAG GCAGACTGTC AGGACCGCTC AGACGAGGCG 3960GACTGTGACG CCATCTGCCT GCCCAACCAG TTCCGGTGTG CGAGCGGCCA GTGTGTCCTC 4020ATCAAACAGC AGTGCGACTC CTTCCCCGAC TGTATCGACG GCTCCGACGA GCTCATGTGT 4080GAAATCACCA AGCCGCCCTC AGACGACAGC CCGGCCCACA GCAGTGCCAT CGGGCCCGTC 4140ATTGGCATCA TCCTCTCTCT CTTCGTCATG GGTGGTGTCT ATTTTGTGTG CCAGCGCGTG 4200GTGTGCCAGC GCTATGCGGG GGCCAACGGG CCCTTCCCGC ACGAGTATGT CAGCGGGACC 4260CCGCACGTGC CCCTCAATTT CATAGCCCCG GGCGGTTCCC AGCATGGCCC CTTCACAGGC 4320ATCGCATGCG GAAAGTCCAT GATGAGCTCC GTGAGCCTGA TGGGGGGCCG GGGCGGGGTG 4380CCCCTCTACG ACCGGAACCA CGTCACAGGG GCCTCGTCCA GCAGCTCGTC CAGCACGAAG 4440GCCACGCTGT ACCCGCCGAT CCTGAACCCG CCGCCCTCCC CGGCCACGGA CCCCTCCCTG 4500TACAACATGG ACATGTTCTA CTCTTCAAAC ATTCCGGCCA CTGTGAGACC GTACAGGCCC 4560TACATCATTC GAGGAATGGC GCCCCCGACG ACGCCCTGCA GCACCGACGT GTGTGACAGC 4620GACTACAGCG CCAGCCGCTG GAAGGCCAGC AAGTACTACC TGGATTTGAA CTCGGACTCA 4680GACCCCTATC CACCCCCACC CACGCCCCAC AGCCAGTACC TGTCGGCGGA GGACAGCTGC 4740CCGCCCTCGC CCGCCACCGA GAGGAGCTAC TTCCATCTCT TCCCGCCCCC TCCGTCCCCC 4800TGCACGGACT CATCCTGACC TCGGCCGGGC CACTCTGGCT TCTCTGTGCC CCTGTAAATA 4860GTTTTAAATA TGAACAAAGA AAAAAATATA TTTTATGATT TAAAAAATAA ATATAATTGG 4920GATTTTAAAA ACATGAGAAA TGTGAACTGT GATGGGGTGG GCAGGGCTGG GAGAACTTTG 4980TACAGTGGAA CAAATATTTA TAAACTTAAT TTTGTAAAAC AG 5022 5162 base pairsnucleic acid single linear 33 AGGCTGGTCT CAAACTCCTG GCCTTAAGTGATCTGCCCGC CTCGGCCTCC CAAAGTGCTG 60 AGATGACAGG TGTGAGCCAC CGTGCCCGGCCCAGAACTCT TTAATTCCCA CCTGAAACTT 120 GCCGCCTTAA GCAGGTCCCC AGTCTCCCTCCCCTAGTCCC TGGTCCCACC ATTCTGCTTT 180 CTGTCTCAAT GAATTTGCCT ACCCCTCGCCGCTCCTGCTA TTTGCCAACC GCCGGGACGT 240 ACGGCTGGTG GACGCCGGCG GAGTCAAGCTGGAGTCCACC ATCGTGGTCA GCGGCCTGGA 300 GGATGCGGCC GCAGTGGACT TCCAGTTTTCCAAGGGAGCC GTGTACTGGA CAGACGTGAG 360 CGAGGAGGCC ATCAAGCAGA CCTACCTGAACCAGACGGGG GCCGCCGTGC AGAACGTGGT 420 CATCTCCGGC CTGGTCTCTC CCGACGGCCTCGCCTGCGAC TGGGTGGGCA AGAAGCTGTA 480 CTGGACGGAC TCAGAGACCA ACCGCATCGAGGTGGCCAAC CTCAATGGCA CATCCCGGAA 540 GGTGCTCTTC TGGCAGGACC TTGACCAGCCGAGGGCCATC GCCTTGGACC CCGCTCACGG 600 GTACATGTAC TGGACAGACT GGGGTGAGACGCCCCGGATT GAGCGGGCAG GGATGGATGG 660 CAGCACCCGG AAGATCATTG TGGACTCGGACATTTACTGG CCCAATGGAC TGACCATCGA 720 CCTGGAGGAG CAGAAGCTCT ACTGGGCTGACGCCAAGCTC AGCTTCATCC ACCGTGCCAA 780 CCTGGACGGC TCGTTCCGGC AGAAGGTGGTGGAGGGCAGC CTGACGCACC CCTTCGCCCT 840 GACGCTCTCC GGGGACACTC TGTACTGGACAGACTGGCAG ACCCGCTCCA TCCATGCCTG 900 CAACAAGCGC ACTGGGGGGA AGAGGAAGGAGATCCTGAGT GCCCTCTACT CACCCATGGA 960 CATCCAGGTG CTGAGCCAGG AGCGGCAGCCTTTCTTCCAC ACTCGCTGTG AGGAGGACAA 1020 TGGCGGCTGC TCCCACCTGT GCCTGCTGTCCCCAAGCGAG CCTTTCTACA CATGCGCCTG 1080 CCCCACGGGT GTGCAGCTGC AGGACAACGGCAGGACGTGT AAGGCAGGAG CCGAGGAGGT 1140 GCTGCTGCTG GCCCGGCGGA CGGACCTACGGAGGATCTCG CTGGACACGC CGGACTTTAC 1200 CGACATCGTG CTGCAGGTGG ACGACATCCGGCACGCCATT GCCATCGACT ACGACCCGCT 1260 AGAGGGCTAT GTCTACTGGA CAGATGACGAGGTGCGGGCC ATCCGCAGGG CGTACCTGGA 1320 CGGGTCTGGG GCGCAGACGC TGGTCAACACCGAGATCAAC GACCCCGATG GCATCGCGGT 1380 CGACTGGGTG GCCCGAAACC TCTACTGGACCGACACGGGC ACGGACCGCA TCGAGGTGAC 1440 GCGCCTCAAC GGCACCTCCC GCAAGATCCTGGTGTCGGAG GACCTGGACG AGCCCCGAGC 1500 CATCGCACTG CACCCCGTGA TGGGCCTCATGTACTGGACA GACTGGGGAG AGAACCCTAA 1560 AATCGAGTGT GCCAACTTGG ATGGGCAGGAGCGGCGTGTG CTGGTCAATG CCTCCCTCGG 1620 GTGGCCCAAC GGCCTGGCCC TGGACCTGCAGGAGGGGAAG CTCTACTGGG GAGACGCCAA 1680 GACAGACAAG ATCGAGGTGA TCAATGTTGATGGGACGAAG AGGCGGACCC TCCTGGAGGA 1740 CAAGCTCCCG CACATTTTCG GGTTCACGCTGCTGGGGGAC TTCATCTACT GGACTGACTG 1800 GCAGCGCCGC AGCATCGAGC GGGTGCACAAGGTCAAGGCC AGCCGGGACG TCATCATTGA 1860 CCAGCTGCCC GACCTGATGG GGCTCAAAGCTGTGAATGTG GCCAAGGTCG TCGGAACCAA 1920 CCCGTGTGCG GACAGGAACG GGGGGTGCAGCCACCTGTGC TTCTTCACAC CCCACGCAAC 1980 CCGGTGTGGC TGCCCCATCG GCCTGGAGCTGCTGAGTGAC ATGAAGACCT GCATCGTGCC 2040 TGAGGCCTCT TGGTCTTCAC CAGCAGAGCCGCCATCCACA GGATCTCCCT CGAGACCAAT 2100 AACAACGACG TGGCCATCCC GCTCACGGGCGTCAAGGAGG CCTCAGCCCT GGACTTTGAT 2160 GTGTCCAACA ACCACATCTA CTGGACAGACGTCAGCCTGA AGACCATCAG CCGCGCCTTC 2220 ATGAACGGGA GCTCGGTGGA GCACGTGGTGGAGTTTGGCC TTGACTACCC CGAGGGCATG 2280 GCCGTTGACT GGATGGGCAA GAACCTCTACTGGGCCGACA CTGGGACCAA CAGAATCGAA 2340 GTGGCGCGGC TGGACGGGCA GTTCCGGCAAGTCCTCGTGT GGAGGGACTT GGACAACCCG 2400 AGGTCGCTGG CCCTGGATCC CACCAAGGGCTACATCTACT GGACCGAGTG GGGCGGCAAG 2460 CCGAGGATCG TGCGGGCCTT CATGGACGGGACCAACTGCA TGACGCTGGT GGACAAGGTG 2520 GGCCGGGCCA ACGACCTCAC CATTGACTACGCTGACCAGC GCCTCTACTG GACCGACCTG 2580 GACACCAACA TGATCGAGTC GTCCAACATGCTGGGTCAGG AGCGGGTCGT GATTGCCGAC 2640 GATCTCCCGC ACCCGTTCGG TCTGACGCAGTACAGCGATT ATATCTACTG GACAGACTGG 2700 AATCTGCACA GCATTGAGCG GGCCGACAAGACTAGCGGCC GGAACCGCAC CCTCATCCAG 2760 GGCCACCTGG ACTTCGTGAT GGACATCCTGGTGTTCCACT CCTCCCGCCA GGATGGCCTC 2820 AATGACTGTA TGCACAACAA CGGGCAGTGTGGGCAGCTGT GCCTTGCCAT CCCCGGCGGC 2880 CACCGCTGCG GCTGCGCCTC ACACTACACCCTGGACCCCA GCAGCCGCAA CTGCAGCCCG 2940 CCCACCACCT TCTTGCTGTT CAGCCAGAAATCTGCCATCA GTCGGATGAT CCCGGACGAC 3000 CAGCACAGCC CGGATCTCAT CCTGCCCCTGCATGGACTGA GGAACGTCAA AGCCATCGAC 3060 TATGACCCAC TGGACAAGTT CATCTACTGGGTGGATGGGC GCCAGAACAT CAAGCGAGCC 3120 AAGGACGACG GGACCCAGCC CTTTGTTTTGACCTCTCTGA GCCAAGGCCA AAACCCAGAC 3180 AGGCAGCCCC ACGACCTCAG CATCGACATCTACAGCCGGA CACTGTTCTG GACGTGCGAG 3240 GCCACCAATA CCATCAACGT CCACAGGCTGAGCGGGGAAG CCATGGGGGT GGTGCTGCGT 3300 GGGGACCGCG ACAAGCCCAG GGCCATCGTCGTCAACGCGG AGCGAGGGTA CCTGTACTTC 3360 ACCAACATGC AGGACCGGGC AGCCAAGATCGAACGCGCAG CCCTGGACGG CACCGAGCGC 3420 GAGGTCCTCT TCACCACCGG CCTCATCCGCCCTGTGGCCC TGGTGGTAGA CAACACACTG 3480 GGCAAGCTGT TCTGGGTGGA CGCGGACCTGAAGCGCATTG AGAGCTGTGA CCTGTCAGGG 3540 GCCAACCGCC TGACCCTGGA GGACGCCAACATCGTGCAGC CTCTGGGCCT GACCATCCTT 3600 GGCAAGCATC TCTACTGGAT CGACCGCCAGCAGCAGATGA TCGAGCGTGT GGAGAAGACC 3660 ACCGGGGACA AGCGGACTCG CATCCAGGGCCGTGTCGCCC ACCTCACTGG CATCCATGCA 3720 GTGGAGGAAG TCAGCCTGGA GGAGTTCTCAGCCCACCCAT GTGCCCGTGA CAATGGTGGC 3780 TGCTCCCACA TCTGTATTGC CAAGGGTGATGGGACACCAC GGTGCTCATG CCCAGTCCAC 3840 CTCGTGCTCC TGCAGAACCT GCTGACCTGTGGAGAGCCGC CCACCTGCTC CCCGGACCAG 3900 TTTGCATGTG CCACAGGGGA GATCGACTGTATCCCCGGGG CCTGGCGCTG TGACGGCTTT 3960 CCCGAGTGCG ATGACCAGAG CGACGAGGAGGGCTGCCCCG TGTGCTCCGC CGCCCAGTTC 4020 CCCTGCGCGC GGGGTCAGTG TGTGGACCTGCGCCTGCGCT GCGACGGCGA GGCAGACTGT 4080 CAGGACCGCT CAGACGAGGC GGACTGTGACGCCATCGCCT GCCCAACCAG TTCCGGTGTG 4140 CGAGCGGCCA GTGTGTCCTC ATCAAACAGCAGTGCGACTC CTTCCCCGAC TGTATCGACG 4200 GCTCCGACGA GCTCATGTGT GAAATCACCAAGCCGCCCTC AGACGACAGC CCGGCCCACA 4260 GCAGTGCCAT CGGGCCCGTC ATTGGCATCATCCTCTCTCT CTTCGTCATG GGTGGTGTCT 4320 ATTTTGTGTG CCAGCGCGTG GTGTGCCAGCGCTATGCGGG GGCCAACGGG CCCTTCCCGC 4380 ACGAGTATGT CAGCGGGACC CCGCACGTGCCCCTCAATTT CATAGCCCCG GGCGGTTCCC 4440 AGCATGGCCC CTTCACAGGC ATCGCATGCGGAAAGTCCAT GATGAGCTCC GTGAGCCTGA 4500 TGGGGGGCCG GGGCGGGGTG CCCCTCTACGACCGGAACCA CGTCACAGGG GCCTCGTCCA 4560 GCAGCTCGTC CAGCACGAAG GCCACGCTGTACCCGCGGAT CCTGAACCCG CCGCCCTCCC 4620 CGGCCACGGA CCCCTCCCTG TACAACATGGACATGTTCTA CTCTTCAAAC ATTCCGGCCA 4680 CTGCGAGACC GTACAGGCCC TACATCATTCGAGGAATGGC GCCCCCGACG ACGCCCTGCA 4740 GCACCGACGT GTGTGACAGC GACTACAGCGCCAGCCGCTG GAAGGCCAGC AAGTACTACC 4800 TGGATTTGAA CTCGGACTCA GACCCCTATCCACCCCCACC CACGCCCCAC AGCCAGTACC 4860 TGTCGGCGGA GGACAGCTGC CCGCCCTCGCCCGCCACCGA GAGGAGCTAC TTCCATCTCT 4920 TCCCGCCCCC TCCGTCCCCC TGCACGGACTCATCCTGACC TCGGCCGGGC CACTCTGGCT 4980 TCTCTGTGCC CCTGTAAATA GTTTTAAATATGAACAAAGA AAAAAATATA TTTTATGATT 5040 TAAAAAATAA ATATAATTGG GATTTTAAAAACATGAGAAA TGTGAACTGT GATGGGGTGG 5100 GCAGGGCTGG GAGAACTTTG TACAGTGGAACAAATATTTA TAAACTTAAT TTTGTAAAAC 5160 AG 5162 114 base pairs nucleicacid single linear 34 CAATGTCCAG TTCCGCTGCA GTTATAACAT CCCATTTTTTGATTTCTTTT TATTTTTTCC 60 TTTTTCTTTT TGAGATGGAG TCTCGCTCTG TCACCCAGGCTGGAGTGCAA TGGG 114 1711 base pairs nucleic acid single linear 35GCCGCGGCGC CCGAGGCGGG AGCAAGAGGC GCCGGGAGCC GCGAGGATCC ACCGCCGCCG 60CGCGCGCCAT GGAGCCCGAG TGAGCGCGCG GCGCTCCCGG CCGCCGGACG ACATGGAAAC 120GGCGCCGACC CGGGCCCCTC CGCCGCCGCC GCCGCCGCTG CTGCTGCTGG TGCTGTACTG 180CAGCTTGGTC CCCGCCGCGG CCTCACCGCT CCTGTTGTTT GCCAACCGCC GGGATGTGCG 240GCTAGTGGAT GCCGGCGGAG TGAAGCTGGA GTCCACCATT GTGGCCAGTG GCCTGGAGGA 300TGCAGCTGCT GTAGACTTCC AGTTCTCCAA GGGTGCTGTG TACTGGACAG ATGTGAGCGA 360GGAGGCCATC AAACAGACCT ACCTGAACCA GACTGGAGGT GCTGCACAGA ACATTGTCAT 420CTCGGGCCTC GTGTCACCTG ATGGCCTGGC CTGTGACTGG GTTGGCAAGA AGCTGTACTG 480GACGGACTCC GAGACCAACC GCATTGAGGT TGCCAACCTC AATGGGACGT CCCGTAAGGT 540TCTCTTCTGG CAGGACCTGG ACCAGCCAAG GGCCATTGCC CTGGATCCTG CACATGGGTA 600CATGTACTGG ACTGACTGGG GGGAAGCACC CCGGATCGAG CGGGCAGGGA TGGATGGCAG 660TACCCGGAAG ATCATTGTAG ACTCCGACAT TTACTGGCCC AATGGGCTGA CCATCGACCT 720GGAGGAACAG AAGCTGTACT GGGCCGATGC CAAGCTCAGC TTCATCCACC GTGCCAACCT 780GGACGGCTCC TTCCGGCAGA AGGTGGTGGA GGGCAGCCTC ACTCACCCTT TTGCCCTGAC 840ACTCTCTGGG GACACACTCT ACTGGACAGA CTGGCAGACC CGCTCCATCC ACGCCTGCAA 900CAAGTGGACA GGGGAGCAGA GGAAGGAGAT CCTTAGTGCT CTGTACTCAC CCATGGACAT 960CCAAGTGCTG AGCCAGGAGC GGCAGCCTCC CTTCCACACA CCATGCGAGG AGGACAACGG 1020TGGCTGTTCC CACCTGTGCC TGCTGTCCCC GAGGGAGCCT TTCTACTCCT GTGCCTGCCC 1080CACTGGTGTG CAGTTGCAGG ACAATGGCAA GACGTGCAAG ACAGGGGCTG AGGAAGTGCT 1140GCTGCTGGCT CGGAGGACAG ACCTGAGGAG GATCTCTCTG GACACCCCTG ACTTCACAGA 1200CATAGTGCTG CAGGTGGGCG ACATCCGGCA TGCCATTGCC ATTGACTACG ATCCCCTGGA 1260GGGCTACGTG TACTGGACCG ATGATGAGGT GCGGGCTATC CGCAGGGCGT ACCTAGATGG 1320CTCAGGTGCG CAGACACTTG TGAACACTGA GATCAATGAC CCCGATGGCA TTGCTGTGGA 1380CTGGGTCGCC CGGAACCTCT ACTGGACAGA TACAGGCACT GACAGAATTG AGGTGACTCG 1440CCTCAACGGC ACCTCCCGAA AGATCCTGGT ATCTGAGGAC CTGGACGAAC CGCGAGCCAT 1500TGTGTTGCAC CCTGTGATGG GCCTCATGTA CTGGACAGAC TGGGGGGAGA ACCCCAAAAT 1560CGAATGCGCC AACCTAGATG GGAGAGATCG GCATGTCCTG GTGAACACCT CCCTTGGGTG 1620GCCCAATGGA CTGGCCCTGG ACCTGCAGGA GGGCAAGCTG TACTGGGGGG ATGCCAAAAC 1680TGATAAAATC GAGGTGATCA ACATAGACGG G 1711 200 base pairs nucleic acidsingle linear 36 GCCGCGGCGC CCGAGGCGGG AGCAAGAGGC GCCGGGAGCC GCGAGGATCCACCGCCGCCG 60 CGCGCGCCAT GGAGCCCGAG TGAGCGCGCG GCGCTCCCGG CCGCCGGACGACATGGAAAC 120 GGCGCCGACC CGGGCCCCTC CGCCGCCGCC GCCGCCGCTG CTGCTGCTGGTGCTGTACTG 180 CAGCTTGGTC CCCGCCGCGG 200 1599 base pairs nucleic acidsingle linear 37 ATGGAAACGG CGCCGACCCG GGCCCCTCCG CCGCCGCCGC CGCCGCTGCTGCTGCTGGTG 60 CTGTACTGCA GCTTGGTCCC CGCCGCGGCC TCACCGCTCC TGTTGTTTGCCAACCGCCGG 120 GATGTGCGGC TAGTGGATGC CGGCGGAGTG AAGCTGGAGT CCACCATTGTGGCCAGTGGC 180 CTGGAGGATG CAGCTGCTGT AGACTTCCAG TTCTCCAAGG GTGCTGTGTACTGGACAGAT 240 GTGAGCGAGG AGGCCATCAA ACAGACCTAC CTGAACCAGA CTGGAGGTGCTGCACAGAAC 300 ATTGTCATCT CGGGCCTCGT GTCACCTGAT GGCCTGGCCT GTGACTGGGTTGGCAAGAAG 360 CTGTACTGGA CGGACTCCGA GACCAACCGC ATTGAGGTTG CCAACCTCAATGGGACGTCC 420 CGTAAGGTTC TCTTCTGGCA GGACCTGGAC CAGCCAAGGG CCATTGCCCTGGATCCTGCA 480 CATGGGTACA TGTACTGGAC TGACTGGGGG GAAGCACCCC GGATCGAGCGGGCAGGGATG 540 GATGGCAGTA CCCGGAAGAT CATTGTAGAC TCCGACATTT ACTGGCCCAATGGGCTGACC 600 ATCGACCTGG AGGAACAGAA GCTGTACTGG GCCGATGCCA AGCTCAGCTTCATCCACCGT 660 GCCAACCTGG ACGGCTCCTT CCGGCAGAAG GTGGTGGAGG GCAGCCTCACTCACCCTTTT 720 GCCCTGACAC TCTCTGGGGA CACACTCTAC TGGACAGACT GGCAGACCCGCTCCATCCAC 780 GCCTGCAACA AGTGGACAGG GGAGCAGAGG AAGGAGATCC TTAGTGCTCTGTACTCACCC 840 ATGGACATCC AAGTGCTGAG CCAGGAGCGG CAGCCTCCCT TCCACACACCATGCGAGGAG 900 GACAACGGTG GCTGTTCCCA CCTGTGCCTG CTGTCCCCGA GGGAGCCTTTCTACTCCTGT 960 GCCTGCCCCA CTGGTGTGCA GTTGCAGGAC AATGGCAAGA CGTGCAAGACAGGGGCTGAG 1020 GAAGTGCTGC TGCTGGCTCG GAGGACAGAC CTGAGGAGGA TCTCTCTGGACACCCCTGAC 1080 TTCACAGACA TAGTGCTGCA GGTGGGCGAC ATCCGGCATG CCATTGCCATTGACTACGAT 1140 CCCCTGGAGG GCTACGTGTA CTGGACCGAT GATGAGGTGC GGGCTATCCGCAGGGCGTAC 1200 CTAGATGGCT CAGGTGCGCA GACACTTGTG AACACTGAGA TCAATGACCCCGATGGCATT 1260 GCTGTGGACT GGGTCGCCCG GAACCTCTAC TGGACAGATA CAGGCACTGACAGAATTGAG 1320 GTGACTCGCC TCAACGGCAC CTCCCGAAAG ATCCTGGTAT CTGAGGACCTGGACGAACCG 1380 CGAGCCATTG TGTTGCACCC TGTGATGGGC CTCATGTACT GGACAGACTGGGGGGAGAAC 1440 CCCAAAATCG AATGCGCCAA CCTAGATGGG AGAGATCGGC ATGTCCTGGTGAACACCTCC 1500 CTTGGGTGGC CCAATGGACT GGCCCTGGAC CTGCAGGAGG GCAAGCTGTACTGGGGGGAT 1560 GCCAAAACTG ATAAAATCGA GGTGATCAAC ATAGACGGG 1599 4959base pairs nucleic acid double linear 38 CCTCGCCGCT CCTGCTATTTGCCAACCGCC GGGACGTACG GCTGGTGGAC GCCGGCGGAG 60 TCAAGCTGGA GTCCACCATCGTGGTCAGCG GCCTGGAGGA TGCGGCCGCA GTGGACTTCC 120 AGTTTTCCAA GGGAGCCGTGTACTGGACAG ACGTGAGCGA GGAGGCCATC AAGCAGACCT 180 ACCTGAACCA GACGGGGGCCGCCGTGCAGA ACGTGGTCAT CTCCGGCCTG GTCTCTCCCG 240 ACGGCCTCGC CTGCGACTGGGTGGGCAAGA AGCTGTACTG GACGGACTCA GAGACCAACC 300 GCATCGAGGT GGCCAACCTCAATGGCACAT CCCGGAAGGT GCTCTTCTGG CAGGACCTTG 360 ACCAGCCGAG GGCCATCGCCTTGGACCCCG CTCACGGGTA CATGTACTGG ACAGACTGGG 420 GTGAGACGCC CCGGATTGAGCGGGCAGGGA TGGATGGCAG CACCCGGAAG ATCATTGTGG 480 ACTCGGACAT TTACTGGCCCAATGGACTGA CCATCGACCT GGAGGAGCAG AAGCTCTACT 540 GGGCTGACGC CAAGCTCAGCTTCATCCACC GTGCCAACCT GGACGGCTCG TTCCGGCAGA 600 AGGTGGTGGA GGGCAGCCTGACGCACCCCT TCGCCCTGAC GCTCTCCGGG GACACTCTGT 660 ACTGGACAGA CTGGCAGACCCGCTCCATCC ATGCCTGCAA CAAGCGCACT GGGGGGAAGA 720 GGAAGGAGAT CCTGAGTGCCCTCTACTCAC CCATGGACAT CCAGGTGCTG AGCCAGGAGC 780 GGCAGCCTTT CTTCCACACTCGCTGTGAGG AGGACAATGG CGGCTGCTCC CACCTGTGCC 840 TGCTGTCCCC AAGCGAGCCTTTCTACACAT GCGCCTGCCC CACGGGTGTG CAGCTGCAGG 900 ACAACGGCAG GACGTGTAAGGCAGGAGCCG AGGAGGTGCT GCTGCTGGCC CGGCGGACGG 960 ACCTACGGAG GATCTCGCTGGACACGCCGG ACTTTACCGA CATCGTGCTG CAGGTGGACG 1020 ACATCCGGCA CGCCATTGCCATCGACTACG ACCCGCTAGA GGGCTATGTC TACTGGACAG 1080 ATGACGAGGT GCGGGCCATCCGCAGGGCGT ACCTGGACGG GTCTGGGGCG CAGACGCTGG 1140 TCAACACCGA GATCAACGACCCCGATGGCA TCGCGGTCGA CTGGGTGGCC CGAAACCTCT 1200 ACTGGACCGA CACGGGCACGGACCGCATCG AGGTGACGCG CCTCAACGGC ACCTCCCGCA 1260 AGATCCTGGT GTCGGAGGACCTGGACGAGC CCCGAGCCAT CGCACTGCAC CCCGTGATGG 1320 GCCTCATGTA CTGGACAGACTGGGGAGAGA ACCCTAAAAT CGAGTGTGCC AACTTGGATG 1380 GGCAGGAGCG GCGTGTGCTGGTCAATGCCT CCCTCGGGTG GCCCAACGGC CTGGCCCTGG 1440 ACCTGCAGGA GGGGAAGCTCTACTGGGGAG ACGCCAAGAC AGACAAGATC GAGGTGATCA 1500 ATGTTGATGG GACGAAGAGGCGGACCCTCC TGGAGGACAA GCTCCCGCAC ATTTTCGGGT 1560 TCACGCTGCT GGGGGACTTCATCTACTGGA CTGACTGGCA GCGCCGCAGC ATCGAGCGGG 1620 TGCACAAGGT CAAGGCCAGCCGGGACGTCA TCATTGACCA GCTGCCCGAC CTGATGGGGC 1680 TCAAAGCTGT GAATGTGGCCAAGGTCGTCG GAACCAACCC GTGTGCGGAC AGGAACGGGG 1740 GGTGCAGCCA CCTGTGCTTCTTCACACCCC ACGCAACCCG GTGTGGCTGC CCCATCGGCC 1800 TGGAGCTGCT GAGTGACATGAAGACCTGCA TCGTGCCTGA GGCCTTCTTG GTCTTCACCA 1860 GCAGAGCCGC CATCCACAGGATCTCCCTCG AGACCAATAA CAACGACGTG GCCATCCCGC 1920 TCACGGGCGT CAAGGAGGCCTCAGCCCTGG ACTTTGATGT GTCCAACAAC CACATCTACT 1980 GGACAGACGT CAGCCTGAAGACCATCAGCC GCGCCTTCAT GAACGGGAGC TCGGTGGAGC 2040 ACGTGGTGAG TTTGGCCTTGACTACCCCGA GGGCATGGCC GTTGACTGGA TGGGCAAGAA 2100 CCTCTACTGG GCCGACACTGGGACCAACAG AATCGAAGTG GCGCGGCTGG ACGGGCAGTT 2160 CCGGCAAGTC CTCGTGTGGAGGGACTTGGA CAACCCGAGG TCGCTGGCCC TGGATCCCAC 2220 CAAGGGCTAC ATCTACTGGACCGAGTGGGG CGGCAAGCCG AGGATCGTGC GGGCCTTCAT 2280 GGACGGGACC AACTGCATGACGCTGGTGGA CAAGGTGGGC CGGGCCAACG ACCTCACCAT 2340 TGACTACGCT GACCAGCGCCTCTACTGGAC CGACCTGGAC ACCAACATGA TCGAGTCGTC 2400 CAACATGCTG GGTCAGGAGCGGGTCGTGAT TGCCGACGAT CTCCCGCACC CGTTCGGTCT 2460 GACGCAGTAC AGCGATTATATCTACTGGAC AGACTGGAAT CTGCACAGCA TTGAGCGGGC 2520 CGACAAGACT AGCGGCCGGAACCGCACCCT CATCCAGGGC CACCTGGACT TCGTGATGGA 2580 CATCCTGGTG TTCCACTCCTCCCGCCAGGA TGGCCTCAAT GACTGTATGC ACAACAACGG 2640 GCAGTGTGGG CAGCTGTGCCTTGCCATCCC CGGCGGCCAC CGCTGCGGCT GCGCCTCACA 2700 CTACACCCTG GACCCCAGCAGCCGCAACTG CAGCCCGCCC ACCACCTTCT TGCTGTTCAG 2760 CCAGAAATCT GCCATCAGTCGGATGATCCC GGACGACCAG CACAGCCCGG ATCTCATCCT 2820 GCCCCTGCAT GGACTGAGGAACGTCAAAGC CATCGACTAT GACCCACTGG ACAAGTTCAT 2880 CTACTGGGTG GATGGGCGCCAGAACATCAA GCGAGCCAAG GACGACGGGA CCCAGCCCTT 2940 TGTTTTGACC TCTCTGAGCCAAGGCCAAAA CCCAGACAGG CAGCCCCACG ACCTCAGCAT 3000 CGACATCTAC AGCCGGACACTGTTCTGGAC GTGCGAGGCC ACCAATACCA TCAACGTCCA 3060 CAGGCTGAGC GGGGAAGCCATGGGGGTGGT GCTGCGTGGG GACCGCGACA AGCCCAGGGC 3120 CATCGTCGTC AACGCGGAGCGAGGGTACCT GTACTTCACC AACATGCAGG ACCGGGCAGC 3180 CAAGATCGAA CGCGCAGCCCTGGACGGCAC CGAGCGCGAG GTCCTCTTCA CCACCGGCCT 3240 CATCCGCCCT GTGGCCCTGGTGGTAGACAA CACACTGGGC AAGCTGTTCT GGGTGGACGC 3300 GGACCTGAAG CGCATTGAGAGCTGTGACCT GTCAGGGGCC AACCGCCTGA CCCTGGAGGA 3360 CGCCAACATC GTGCAGCCTCTGGGCCTGAC CATCCTTGGC AAGCATCTCT ACTGGATCGA 3420 CCGCCAGCAG CAGATGATCGAGCGTGTGGA GAAGACCACC GGGGACAAGC GGACTCGCAT 3480 CCAGGGCCGT GTCGCCCACCTCACTGGCAT CCATGCAGTG GAGGAAGTCA GCCTGGAGGA 3540 GTTCTCAGCC CACCCATGTGCCCGTGACAA TGGTGGCTGC TCCCACATCT GTATTGCCAA 3600 GGGTGATGGG ACACCACGGTGCTCATGCCC AGTCCACCTC GTGCTCCTGC AGAACCTGCT 3660 GACCTGTGGA GAGCCGCCCACCTGCTCCCC GGACCAGTTT GCATGTGCCA CAGGGGAGAT 3720 CGACTGTATC CCCGGGGCCTGGCGCTGTGA CGGCTTTCCC GAGTGCGATG ACCAGAGCGA 3780 CGAGGAGGGC TGCCCCGTGTGCTCCGCCGC CCAGTTCCCC TGCGCGCGGG GTCAGTGTGT 3840 GGACCTGCGC CTGCGCTGCGACGGCGAGGC AGACTGTCAG GACCGCTCAG ACGAGGCGGA 3900 CTGTGACGCC ATCTGCCTGCCCAACCAGTT CCGGTGTGCG AGCGGCCAGT GTGTCCTCAT 3960 CAAACAGCAG TGCGACTCCTTCCCCGACTG TATCGACGGC TCCGACGAGC TCATGTGTGA 4020 AATCACCAAG CCGCCCTCAGACGACAGCCC GGCCCACAGC AGTGCCATCG GGCCCGTCAT 4080 TGGCATCATC CTCTCTCTCTTCGTCATGGG TGGTGTCTAT TTTGTGTGCC AGCGCGTGGT 4140 GTGCCAGCGC TATGCGGGGGCCAACGGCCC TTCCCGCACG AGTATGTCAG CGGGACCCCG 4200 CACGTGCCCC TCAATTTCATAGCCCCGGGC GGTTCCCAGC ATGGCCCCTT CACAGGCATC 4260 GCATGCGGAA AGTCCATGATGAGCTCCGTG AGCCTGATGG GGGGCCGGGG CGGGGTGCCC 4320 CTCTACGACC GGAACCACGTCACAGGGGCC TCGTCCAGCA GCTCGTCCAG CACGAAGGCC 4380 ACGCTGTACC CGCCGATCCTGAACCCGCCG CCCTCCCCGG CCACGGACCC CTCCCTGTAC 4440 AACATGGACA TGTTCTACTCTTCAAACATT CCGGCCACTG TGAGACCGTA CAGGCCCTAC 4500 ATCATTCGAG GAATGGCGCCCCCGACGACG CCCTGCAGCA CCGACGTGTG TGACAGCGAC 4560 TACAGCGCCA GCCGCTGGAAGGCCAGCAAG TACTACCTGG ATTTGAACTC GGACTCAGAC 4620 CCCTATCCAC CCCCACCCACGCCCCACAGC CAGTACCTGT CGGCGGAGGA CAGCTGCCCG 4680 CCCTCGCCCG CCACCGAGAGGAGCTACTTC CATCTCTTCC CGCCCCCTCC GTCCCCCTGC 4740 ACGGACTCAT CCTGACCTCGGCCGGGCCAC TCTGGCTTCT CTGTGCCCCT GTAAATAGTT 4800 TTAAATATGA ACAAAGAAAAAAATATATTT TATGATTTAA AAAATAAATA TAATTGGGAT 4860 TTTAAAAACA TGAGAAATGTGAACTGTGAT GGGGTGGGCA GGGCTGGGAG AACTTTGTAC 4920 AGTGGAACAA ATATTTATAAACTTAATTTT GTAAAACAG 4959 1584 amino acids amino acid linear 39 Ser ProLeu Leu Leu Phe Ala Asn Arg Arg Asp Val Arg Leu Val Asp 1 5 10 15 AlaGly Gly Val Lys Leu Glu Ser Thr Ile Val Val Ser Gly Leu Glu 20 25 30 AspAla Ala Ala Val Asp Phe Gln Phe Ser Lys Gly Ala Val Tyr Trp 35 40 45 ThrAsp Val Ser Glu Glu Ala Ile Lys Gln Thr Tyr Leu Asn Gln Thr 50 55 60 GlyAla Ala Val Gln Asn Val Val Ile Ser Gly Leu Val Ser Pro Asp 65 70 75 80Gly Leu Ala Cys Asp Trp Val Gly Lys Lys Leu Tyr Trp Thr Asp Ser 85 90 95Glu Thr Asn Arg Ile Glu Val Ala Asn Leu Asn Gly Thr Ser Arg Lys 100 105110 Val Leu Phe Trp Gln Asp Leu Asp Gln Pro Arg Ala Ile Ala Leu Asp 115120 125 Pro Ala His Gly Tyr Met Tyr Trp Thr Asp Trp Gly Glu Thr Pro Arg130 135 140 Ile Glu Arg Ala Gly Met Asp Gly Ser Thr Arg Lys Ile Ile ValAsp 145 150 155 160 Ser Asp Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp LeuGlu Glu Gln 165 170 175 Lys Leu Tyr Trp Ala Asp Ala Lys Leu Ser Phe IleHis Arg Ala Asn 180 185 190 Leu Asp Gly Ser Phe Arg Gln Lys Val Val GluGly Ser Leu Thr His 195 200 205 Pro Phe Ala Leu Thr Leu Ser Gly Asp ThrLeu Tyr Trp Thr Asp Trp 210 215 220 Gln Thr Arg Ser Ile His Ala Cys AsnLys Arg Thr Gly Gly Lys Arg 225 230 235 240 Lys Glu Ile Leu Ser Ala LeuTyr Ser Pro Met Asp Ile Gln Val Leu 245 250 255 Ser Gln Glu Arg Gln ProPhe Phe His Thr Arg Cys Glu Glu Asp Asn 260 265 270 Gly Gly Cys Ser HisLeu Cys Leu Leu Ser Pro Ser Glu Pro Phe Tyr 275 280 285 Thr Cys Ala CysPro Thr Gly Val Gln Leu Gln Asp Asn Gly Arg Thr 290 295 300 Cys Lys AlaGly Ala Glu Glu Val Leu Leu Leu Ala Arg Arg Thr Asp 305 310 315 320 LeuArg Arg Ile Ser Leu Asp Thr Pro Asp Phe Thr Asp Ile Val Leu 325 330 335Gln Val Asp Asp Ile Arg His Ala Ile Ala Ile Asp Tyr Asp Pro Leu 340 345350 Glu Gly Tyr Val Tyr Trp Thr Asp Asp Glu Val Arg Ala Ile Arg Arg 355360 365 Ala Tyr Leu Asp Gly Ser Gly Ala Gln Thr Leu Val Asn Thr Glu Ile370 375 380 Asn Asp Pro Asp Gly Ile Ala Val Asp Trp Val Ala Arg Asn LeuTyr 385 390 395 400 Trp Thr Asp Thr Gly Thr Asp Arg Ile Glu Val Thr ArgLeu Asn Gly 405 410 415 Thr Ser Arg Lys Ile Leu Val Ser Glu Asp Leu AspGlu Pro Arg Ala 420 425 430 Ile Ala Leu His Pro Val Met Gly Leu Met TyrTrp Thr Asp Trp Gly 435 440 445 Glu Asn Pro Lys Ile Glu Cys Ala Asn LeuAsp Gly Gln Glu Arg Arg 450 455 460 Val Leu Val Asn Ala Ser Leu Gly TrpPro Asn Gly Leu Ala Leu Asp 465 470 475 480 Leu Gln Glu Gly Lys Leu TyrTrp Gly Asp Ala Lys Thr Asp Lys Ile 485 490 495 Glu Val Ile Asn Val AspGly Thr Lys Arg Arg Thr Leu Leu Glu Asp 500 505 510 Lys Leu Pro His IlePhe Gly Phe Thr Leu Leu Gly Asp Phe Ile Tyr 515 520 525 Trp Thr Asp TrpGln Arg Arg Ser Ile Glu Arg Val His Lys Val Lys 530 535 540 Ala Ser ArgAsp Val Ile Ile Asp Gln Leu Pro Asp Leu Met Gly Leu 545 550 555 560 LysAla Val Asn Val Ala Lys Val Val Gly Thr Asn Pro Cys Ala Asp 565 570 575Arg Asn Gly Gly Cys Ser His Leu Cys Phe Phe Thr Pro His Ala Thr 580 585590 Arg Cys Gly Cys Pro Ile Gly Leu Glu Leu Leu Ser Asp Met Lys Thr 595600 605 Cys Ile Val Pro Glu Ala Phe Leu Val Phe Thr Ser Arg Ala Ala Ile610 615 620 His Arg Ile Ser Leu Glu Thr Asn Asn Asn Asp Val Ala Ile ProLeu 625 630 635 640 Thr Gly Val Lys Glu Ala Ser Ala Leu Asp Phe Asp ValSer Asn Asn 645 650 655 His Ile Tyr Trp Thr Asp Val Ser Leu Lys Thr IleSer Arg Ala Phe 660 665 670 Met Asn Gly Ser Ser Val Glu His Val Val GluPhe Gly Leu Asp Tyr 675 680 685 Pro Glu Gly Met Ala Val Asp Trp Met GlyLys Asn Leu Tyr Trp Ala 690 695 700 Asp Thr Gly Thr Asn Arg Ile Glu ValAla Arg Leu Asp Gly Gln Phe 705 710 715 720 Arg Gln Val Leu Val Trp ArgAsp Leu Asp Asn Pro Arg Ser Leu Ala 725 730 735 Leu Asp Pro Thr Lys GlyTyr Ile Tyr Trp Thr Glu Trp Gly Gly Lys 740 745 750 Pro Arg Ile Val ArgAla Phe Met Asp Gly Thr Asn Cys Met Thr Leu 755 760 765 Val Asp Lys ValGly Arg Ala Asn Asp Leu Thr Ile Asp Tyr Ala Asp 770 775 780 Gln Arg LeuTyr Trp Thr Asp Leu Asp Thr Asn Met Ile Glu Ser Ser 785 790 795 800 AsnMet Leu Gly Gln Glu Arg Val Val Ile Ala Asp Asp Leu Pro His 805 810 815Pro Phe Gly Leu Thr Gln Tyr Ser Asp Tyr Ile Tyr Trp Thr Asp Trp 820 825830 Asn Leu His Ser Ile Glu Arg Ala Asp Lys Thr Ser Gly Arg Asn Arg 835840 845 Thr Leu Ile Gln Gly His Leu Asp Phe Val Met Asp Ile Leu Val Phe850 855 860 His Ser Ser Arg Gln Asp Gly Leu Asn Asp Cys Met His Asn AsnGly 865 870 875 880 Gln Cys Gly Gln Leu Cys Leu Ala Ile Pro Gly Gly HisArg Cys Gly 885 890 895 Cys Ala Ser His Tyr Thr Leu Asp Pro Ser Ser ArgAsn Cys Ser Pro 900 905 910 Pro Thr Thr Phe Leu Leu Phe Ser Gln Lys SerAla Ile Ser Arg Met 915 920 925 Ile Pro Asp Asp Gln His Ser Pro Asp LeuIle Leu Pro Leu His Gly 930 935 940 Leu Arg Asn Val Lys Ala Ile Asp TyrAsp Pro Leu Asp Lys Phe Ile 945 950 955 960 Tyr Trp Val Asp Gly Arg GlnAsn Ile Lys Arg Ala Lys Asp Asp Gly 965 970 975 Thr Gln Pro Phe Val LeuThr Ser Leu Ser Gln Gly Gln Asn Pro Asp 980 985 990 Arg Gln Pro His AspLeu Ser Ile Asp Ile Tyr Ser Arg Thr Leu Phe 995 1000 1005 Trp Thr CysGlu Ala Thr Asn Thr Ile Asn Val His Arg Leu Ser Gly 1010 1015 1020 GluAla Met Gly Val Val Leu Arg Gly Asp Arg Asp Lys Pro Arg Ala 1025 10301035 1040 Ile Val Val Asn Ala Glu Arg Gly Tyr Leu Tyr Phe Thr Asn MetGln 1045 1050 1055 Asp Arg Ala Ala Lys Ile Glu Arg Ala Ala Leu Asp GlyThr Glu Arg 1060 1065 1070 Glu Val Leu Phe Thr Thr Gly Leu Ile Arg ProVal Ala Leu Val Val 1075 1080 1085 Asp Asn Thr Leu Gly Lys Leu Phe TrpVal Asp Ala Asp Leu Lys Arg 1090 1095 1100 Ile Glu Ser Cys Asp Leu SerGly Ala Asn Arg Leu Thr Leu Glu Asp 1105 1110 1115 1120 Ala Asn Ile ValGln Pro Leu Gly Leu Thr Ile Leu Gly Lys His Leu 1125 1130 1135 Tyr TrpIle Asp Arg Gln Gln Gln Met Ile Glu Arg Val Glu Lys Thr 1140 1145 1150Thr Gly Asp Lys Arg Thr Arg Ile Gln Gly Arg Val Ala His Leu Thr 11551160 1165 Gly Ile His Ala Val Glu Glu Val Ser Leu Glu Glu Phe Ser AlaHis 1170 1175 1180 Pro Cys Ala Arg Asp Asn Gly Gly Cys Ser His Ile CysIle Ala Lys 1185 1190 1195 1200 Gly Asp Gly Thr Pro Arg Cys Ser Cys ProVal His Leu Val Leu Leu 1205 1210 1215 Gln Asn Leu Leu Thr Cys Gly GluPro Pro Thr Cys Ser Pro Asp Gln 1220 1225 1230 Phe Ala Cys Ala Thr GlyGlu Ile Asp Cys Ile Pro Gly Ala Trp Arg 1235 1240 1245 Cys Asp Gly PhePro Glu Cys Asp Asp Gln Ser Asp Glu Glu Gly Cys 1250 1255 1260 Pro ValCys Ser Ala Ala Gln Phe Pro Cys Ala Arg Gly Gln Cys Val 1265 1270 12751280 Asp Leu Arg Leu Arg Cys Asp Gly Glu Ala Asp Cys Gln Asp Arg Ser1285 1290 1295 Asp Glu Ala Asp Cys Asp Ala Ile Cys Leu Pro Asn Gln PheArg Cys 1300 1305 1310 Ala Ser Gly Gln Cys Val Leu Ile Lys Gln Gln CysAsp Ser Phe Pro 1315 1320 1325 Asp Cys Ile Asp Gly Ser Asp Glu Leu MetCys Glu Ile Thr Lys Pro 1330 1335 1340 Pro Ser Asp Asp Ser Pro Ala HisSer Ser Ala Ile Gly Pro Val Ile 1345 1350 1355 1360 Gly Ile Ile Leu SerLeu Phe Val Met Gly Gly Val Tyr Phe Val Cys 1365 1370 1375 Gln Arg ValVal Cys Gln Arg Tyr Ala Gly Ala Asn Gly Pro Phe Pro 1380 1385 1390 HisGlu Tyr Val Ser Gly Thr Pro His Val Pro Leu Asn Phe Ile Ala 1395 14001405 Pro Gly Gly Ser Gln His Gly Pro Phe Thr Gly Ile Ala Cys Gly Lys1410 1415 1420 Ser Met Met Ser Ser Val Ser Leu Met Gly Gly Arg Gly GlyVal Pro 1425 1430 1435 1440 Leu Tyr Asp Arg Asn His Val Thr Gly Ala SerSer Ser Ser Ser Ser 1445 1450 1455 Ser Thr Lys Ala Thr Leu Tyr Pro ProIle Leu Asn Pro Pro Pro Ser 1460 1465 1470 Pro Ala Thr Asp Pro Ser LeuTyr Asn Met Asp Met Phe Tyr Ser Ser 1475 1480 1485 Asn Ile Pro Ala ThrVal Arg Pro Tyr Arg Pro Tyr Ile Ile Arg Gly 1490 1495 1500 Met Ala ProPro Thr Thr Pro Cys Ser Thr Asp Val Cys Asp Ser Asp 1505 1510 1515 1520Tyr Ser Ala Ser Arg Trp Lys Ala Ser Lys Tyr Tyr Leu Asp Leu Asn 15251530 1535 Ser Asp Ser Asp Pro Tyr Pro Pro Pro Pro Thr Pro His Ser GlnTyr 1540 1545 1550 Leu Ser Ala Glu Asp Ser Cys Pro Pro Ser Pro Ala ThrGlu Arg Ser 1555 1560 1565 Tyr Phe His Leu Phe Pro Pro Pro Pro Ser ProCys Thr Asp Ser Ser 1570 1575 1580 5117 base pairs nucleic acid singlelinear 40 GCCGCGGCGC CCGAGGCGGG AGCAAGAGGC GCCGGGAGCC GCGAGGATCCACCGCCGCCG 60 CGCGCGCCAT GGAGCCCGAG TGAGCGCGCG GCGCTCCCGG CCGCCGGACGACATGGAAAC 120 GGCGCCGACC CGGGCCCCTC CGCCGCCGCC GCCGCCGCTG CTGCTGCTGGTGCTGTACTG 180 CAGCTTGGTC CCCGCCGCGG CCTCACCGCT CCTGTTGTTT GCCAACCGCCGGGATGTGCG 240 GCTAGTGGAT GCCGGCGGAG TGAAGCTGGA GTCCACCATT GTGGCCAGTGGCCTGGAGGA 300 TGCAGCTGCT GTAGACTTCC AGTTCTCCAA GGGTGCTGTG TACTGGACAGATGTGAGCGA 360 GGAGGCCATC AAACAGACCT ACCTGAACCA GACTGGAGCT GCTGCACAGAACATTGTCAT 420 CTCGGGCCTC GTGTCACCTG ATGGCCTGGC CTGTGACTGG GTTGGCAAGAAGCTGTACTG 480 GACGGACTCC GAGACCAACC GCATTGAGGT TGCCAACCTC AATGGGACGTCCCGTAAGGT 540 TCTCTTCTGG CAGGACCTGG ACCAGCCAAG GGCCATTGCC CTGGATCCTGCACATGGGTA 600 CATGTACTGG ACTGACTGGG GGGAAGCACC CCGGATCGAG CGGGCAGGGATGGATGGCAG 660 TACCCGGAAG ATCATTGTAG ACTCCGACAT TTACTGGCCC AATGGGCTGACCATCGACCT 720 GGAGGAACAG AAGCTGTACT GGGCCGATGC CAAGCTCAGC TTCATCCACCGTGCCAACCT 780 GGACGGCTCC TTCCGGCAGA AGGTGGTGGA GGGCAGCCTC ACTCACCCTTTTGCCCTGAC 840 ACTCTCTGGG GACACACTCT ACTGGACAGA CTGGCAGACC CGCTCCATCCACGCCTGCAA 900 CAAGTGGACA GGGGAGCAGA GGAAGGAGAT CCTTAGTGCT CTGTACTCACCCATGGACAT 960 CCAAGTGCTG AGCCAGGAGC GGCAGCCTCC CTTCCACACA CCATGCGAGGAGGACAACGG 1020 TGGCTGTTCC CACCTGTGCC TGCTGTCCCC GAGGGAGCCT TTCTACTCCTGTGCCTGCCC 1080 CACTGGTGTG CAGTTGCAGG ACAATGGCAA GACGTGCAAG ACAGGGGCTGAGGAAGTGCT 1140 GCTGCTGGCT CGGAGGACAG ACCTGAGGAG GATCTCTCTG GACACCCCTGACTTCACAGA 1200 CATAGTGCTG CAGGTGGGCG ACATCCGGCA TGCCATTGCC ATTGACTACGATCCCCTGGA 1260 GGGCTACGTG TACTGGACCG ATGATGAGGT GCGGGCTATC CGCAGGGCGTACCTAGATGG 1320 CTCAGGTGCG CAGACACTTG TGAACACTGA GATCAATGAC CCCGATGGCATTGCTGTGGA 1380 CTGGGTCGCC CGGAACCTCT ACTGGACAGA TACAGGCACT GACAGAATTGAGGTGACTCG 1440 CCTCAACGGC ACCTCCCGAA AGATCCTGGT ATCTGAGGAC CTGGACGAACCGCGAGCCAT 1500 TGTGTTGCAC CCTGTGATGG GCCTCATGTA CTGGACAGAC TGGGGGGAGAACCCCAAAAT 1560 CGAATGCGCC AACCTAGATG GGAGAGATCG GCATGTCCTG GTGAACACCTCCCTTGGGTG 1620 GCCCAATGGA CTGGCCCTGG ACCTGCAGGA GGGCAAGCTG TACTGGGGGGATGCCAAAAC 1680 TGATAAAATC GAGGTGATCA ACATAGACGG GACAAAGCGG AAGACCCTGCTTGAGGACAA 1740 GCTCCCACAC ATTTTTGGGT TCACACTGCT GGGGGACTTC ATCTACTGGACCGACTGGCA 1800 GAGACGCAGT ATTGAAAGGG TCCACAAGGT CAAGGCCAGC CGGGATGTCATCATTGATCA 1860 ACTCCCCGAC CTGATGGGAC TCAAAGCCGT GAATGTGGCC AAGGTTGTCGGAACCAACCC 1920 ATGTGCGGAT GGAAATGGAG GGTGCAGCCA TCTGTGCTTC TTCACCCCACGTGCCACCAA 1980 GTGTGGCTGC CCCATTGGCC TGGAGCTGTT GAGTGACATG AAGACCTGCATAATCCCCGA 2040 GGCCTTCCGG TATTCACCAG CAGAGCCACC ATCCACAGGA TCTCCCTGGAGACTAACAAC 2100 AACGATGTGG CTATCCCACT CACGGGTGTC AAAGAGGCCT CTGCACTGGACTTTGATGTG 2160 TCCAACAATC ACATCTACTG GACTGATGTT AGCCTCAAGA CGATCAGCCGAGCCTTCATG 2220 AATGGGAGCT CAGTGGAGCA CGTGATTGAG TTTGGCCTCG ACTACCCTGAAGGAATGGCT 2280 GTGGACTGGA TGGGCAAGAA CCTCTATTGG GCGGACACAG GGACCAACAGGATTGAGGTG 2340 GCCCGGCTGG ATGGGCAGTT CCGGCAGGTG CTTGTGTGGA GAGACCTTGACAACCCCAGG 2400 TCTCTGGCTC TGGATCCTAC TAAAGGCTAC ATCTACTGGA CTGAGTGGGGTGGCAAGCCA 2460 AGGATTGTGC GGGCCTTCAT GGATGGGACC AATTGTATGA CACTGGTAGACAAGGTGGGC 2520 CGGGCCAACG ACCTCACCAT TGATTATGCC GACCAGCGAC TGTACTGGACTGACCTGGAC 2580 ACCAACATGA TTGAGTCTTC CAACATGCTG GGTCAGGAGC GCATGGTGATAGCTGACGAT 2640 CTGCCCTACC CGTTTGGCCT GACTCAATAT AGCGATTACA TCTACTGGACTGACTGGAAC 2700 CTGCATAGCA TTGAACGGGC GGACAAGACC AGTGGGCGGA ACCGCACCCTCATCCAGGGT 2760 CACCTGGACT TCGTCATGGA CATCCTGGTG TTCCACTCCT CCCGTCAGGATGGCCTCAAC 2820 GACTGCGTGC ACAGCAATGG CCAGTGTGGG CAGCTGTGCC TCGCCATCCCCGGAGGCCAC 2880 CGCTGTGGCT GTGCTTCACA CTACACGCTG GACCCCAGCA GCCGCAACTGCAGCCCGCCC 2940 TCCACCTTCT TGCTGTTCAG CCAGAAATTT GCCATCAGCC GGATGATCCCCGATGACCAG 3000 CTCAGCCCGG ACCTTGTCCT ACCCCTTCAT GGGCTGAGGA ACGTCAAAGCCATCAACTAT 3060 GACCCGCTGG ACAAGTTCAT CTACTGGGTG GACGGGCGCC AGAACATCAAGAGGGCCAAG 3120 GACGACGGTA CCCAGCCCTC CATGCTGACC TCTCCCAGCC AAAGCCTGAGCCCAGACAGA 3180 CAGCCACACG ACCTCAGCAT TGACATCTAC AGCCGGACAC TGTTCTGGACCTGTGAGGCC 3240 ACCAACACTA TCAATGTCCA CCGGCTGGAT GGGGATGCCA TGGGAGTGGTGCTTCGAGGG 3300 GACCGTGACA AGCCAAGGGC CATTGCTGTC AATGCTGAGC GAGGGTACATGTACTTTACC 3360 AACATGCAGG ACCATGCTGC CAAGATCGAG CGAGCCTCCC TGGATGGCACAGAGCGGGAG 3420 GTCCTCTTCA CCACAGGCCT CATCCGTCCC GTGGCCCTTG TGGTGGACAATGCTCTGGGC 3480 AAGCTCTTCT GGGTGGATGC CGACCTAAAG CGAATCGAAA GCTGTGACCTCTCTGGGGCC 3540 AACCGCCTGA CCCTGGAAGA TGCCAACATC GTACAGCCAG TAGGTCTGACAGTGCTGGGC 3600 AGGCACCTCT ACTGGATCGA CCGCCAGCAG CAGATGATCG AGCGCGTGGAGAAGACCACT 3660 GGGGACAAGC GGACTAGGGT TCAGGGCCGT GTCACCCACC TGACAGGCATCCATGCCGTG 3720 GAGGAAGTCA GCCTGGAGGA GTTCTCAGCC CATCCTTGTG CCCGAGACAATGGCGGCTGC 3780 TCCCACATCT GTATCGCCAA GGGTGATGGA ACACCGCGCT GCTCGTGCCCTGTCCACCTG 3840 GTGCTCCTGC AGAACCTGCT GACTTGTGGT GAGCCTCCTA CCTGCTCCCCTGATCAGTTT 3900 GCATGTACCA CTGGTGAGAT CGACTGCATC CCCGGAGCCT GGCGCTGTGACGGCTTCCCT 3960 GAGTGTGCTG ACCAGAGTGA TGAAGAAGGC TGCCCAGTGT GCTCCGCCTCTCAGTTCCCC 4020 TGCGCTCGAG GCCAGTGTGT GGACCTGCGG TTACGCTGCG ACGGTGAGGCCGACTGCCAG 4080 GATCGCTCTG ATGAAGTAAC TGCGATGCTG TCTGTCTGCC CAATCAGTTCCGGTGCACCA 4140 GCGGCCAGTG TGTCCTCATC AAGCAACAGT GTGACTCCTT CCCCGACTGTGCTGATGGGT 4200 CTGATGAGCT CATGTGTGAA ATCAACAAGC CACCCTCTGA TGACATCCCAGCCCACAGCA 4260 GTGCCATTGG GCCCGTCATT GGTATCATCC TCTCCCTCTT CGTCATGGGCGGGGTCTACT 4320 TTGTCTGCCA GCGTGTGATG TGCCAGCGCT ACACAGGGGC CAGTGGGCCCTTTCCCCACG 4380 AGTATGTTGG TGGAGCCCCT CATGTGCCTC TCAACTTCAT AGCCCCAGGTGGCTCACAGC 4440 ACGGTCCCTT CCCAGGCATC CCGTGCAGCA AGTCCGTGAT GAGCTCCATGAGCCTGGTGG 4500 GGGGGCGCGG CAGCGTGCCC CTCTATGACC GGAATCACGT CACTGGGGCCTCATCCAGCA 4560 GCTCGTCCAG CACAAAGGCC ACACTATATC CGCCGATCCT GAACCCACCCCCGTCCCCGG 4620 CCACAGACCC CTCTCTCTAC AACGTGGACG TGTTTTATTC TTCAGGCATCCCGGCCACCG 4680 CTAGACCATA CAGGCCCTAC GTCATTCGAG GTATGGCACC CCCAACAACACCGTGCAGCA 4740 CAGATGTGTG TGACAGTGAC TACAGCATCA GTCGCTGGAA GAGCAGCAAATACTACCTGG 4800 ACTTGAATTC GGACTCAGAC CCCTACCCCC CCCCGCCCAC CCCCCACAGCCAGTACCTAT 4860 CTGCAGAGGA CAGCTGCCCA CCCTCACCAG GCACTGAGAG GAGTTACTGCCACCTCTTCC 4920 CGCCCCCACC GTCCCCCTGC ACGGACTCGT CCTGACCTCG GCCGTCCACCCGGCCCTGCT 4980 GCCTCCCTGT AAATATTTTT AAATATGAAC AAAGGAAAAA TATATTTTATGATTTAAAAA 5040 ATAAATATAA TTGGGGTTTT TAACAAGTGA GAAATGTGAG CGGTGAAGGGGTGGGCAGGG 5100 CTGGGAAACT TTTCTAG 5117 4843 base pairs nucleic acidsingle linear 41 ATGGAAACGG CGCCGACCCG GGCCCCTCCG CCGCCGCCGC CGCCGCTGCTGCTGCTGGTG 60 CTGTACTGCA GCTTGGTCCC CGCCGCGGCC TCACCGCTCC TGTTGTTTGCCAACCGCCGG 120 GATGTGCGGC TAGTGGATGC CGGCGGAGTG AAGCTGGAGT CCACCATTGTGGCCAGTGGC 180 CTGGAGGATG CAGCTGCTGT AGACTTCCAG TTCTCCAAGG GTGCTGTGTACTGGACAGAT 240 GTGAGCGAGG AGGCCATCAA ACAGACCTAC CTGAACCAGA CTGGAGCTGCTGCACAGAAC 300 ATTGTCATCT CGGGCCTCGT GTCACCTGAT GGCCTGGCCT GTGACTGGGTTGGCAAGAAG 360 CTGTACTGGA CGGACTCCGA GACCAACCGC ATTGAGGTTG CCAACCTCAATGGGACGTCC 420 CGTAAGGTTC TCTTCTGGCA GGACCTGGAC CAGCCAAGGG CCATTGCCCTGGATCCTGCA 480 CATGGGTACA TGTACTGGAC TGACTGGGGG GAAGCACCCC GGATCGAGCGGGCAGGGATG 540 GATGGCAGTA CCCGGAAGAT CATTGTAGAC TCCGACATTT ACTGGCCCAATGGGCTGACC 600 ATCGACCTGG AGGAACAGAA GCTGTACTGG GCCGATGCCA AGCTCAGCTTCATCCACCGT 660 GCCAACCTGG ACGGCTCCTT CCGGCAGAAG GTGGTGGAGG GCAGCCTCACTCACCCTTTT 720 GCCCTGACAC TCTCTGGGGA CACACTCTAC TGGACAGACT GGCAGACCCGCTCCATCCAC 780 GCCTGCAACA AGTGGACAGG GGAGCAGAGG AAGGAGATCC TTAGTGCTCTGTACTCACCC 840 ATGGACATCC AAGTGCTGAG CCAGGAGCGG CAGCCTCCCT TCCACACACCATGCGAGGAG 900 GACAACGGTG GCTGTTCCCA CCTGTGCCTG CTGTCCCCGA GGGAGCCTTTCTACTCCTGT 960 GCCTGCCCCA CTGGTGTGCA GTTGCAGGAC AATGGCAAGA CGTGCAAGACAGGGGCTGAG 1020 GAAGTGCTGC TGCTGGCTCG GAGGACAGAC CTGAGGAGGA TCTCTCTGGACACCCCTGAC 1080 TTCACAGACA TAGTGCTGCA GGTGGGCGAC ATCCGGCATG CCATTGCCATTGACTACGAT 1140 CCCCTGGAGG GCTACGTGTA CTGGACCGAT GATGAGGTGC GGGCTATCCGCAGGGCGTAC 1200 CTAGATGGCT CAGGTGCGCA GACACTTGTG AACACTGAGA TCAATGACCCCGATGGCATT 1260 GCTGTGGACT GGGTCGCCCG GAACCTCTAC TGGACAGATA CAGGCACTGACAGAATTGAG 1320 GTGACTCGCC TCAACGGCAC CTCCCGAAAG ATCCTGGTAT CTGAGGACCTGGACGAACCG 1380 CGAGCCATTG TGTTGCACCC TGTGATGGGC CTCATGTACT GGACAGACTGGGGGGAGAAC 1440 CCCAAAATCG AATGCGCCAA CCTAGATGGG AGAGATCGGC ATGTCCTGGTGAACACCTCC 1500 CTTGGGTGGC CCAATGGACT GGCCCTGGAC CTGCAGGAGG GCAAGCTGTACTGGGGGGAT 1560 GCCAAAACTG ATAAAATCGA GGTGATCAAC ATAGACGGGA CAAAGCGGAAGACCCTGCTT 1620 GAGGACAAGC TCCCACACAT TTTTGGGTTC ACACTGCTGG GGGACTTCATCTACTGGACC 1680 GACTGGCAGA GACGCAGTAT TGAAAGGGTC CACAAGGTCA AGGCCAGCCGGGATGTCATC 1740 ATTGATCAAC TCCCCGACCT GATGGGACTC AAAGCCGTGA ATGTGGCCAAGGTTGTCGGA 1800 ACCAACCCAT GTGCGGATGG AAATGGAGGG TGCAGCCATC TGTGCTTCTTCACCCCACGT 1860 GCCACCAAGT GTGGCTGCCC CATTGGCCTG GAGCTGTTGA GTGACATGAAGACCTGCATA 1920 ATCCCCGAGG CCTTCCTGGT ATTCACCAGC AGAGCCACCA TCCACAGGATCTCCCTGGAG 1980 ACTAACAACA ACGATGTGGC TATCCCACTC ACGGGTGTCA AAGAGGCCTCTGCACTGGAC 2040 TTTGATGTTC CAACAATCAC ATCTACTGGA CTGATGTTAG CCTCAAGACGATCAGCCGAG 2100 CCTTCATGAA TGGGAGCTCA GTGGAGCACG TGATTGAGTT TGGCCTCGACTACCCTGAAG 2160 GAATGGCTGT GGACTGGATG GGCAAGAACC TCTATTGGGC GGACACAGGGACCAACAGGA 2220 TTGAGGTGGC CCGGCTGGAT GGGCAGTTCC GGCAGGTGCT TGTGTGGAGAGACCTTGACA 2280 ACCCCAGGTC TCTGGCTCTG GATCCTACTA AAGGCTACAT CTACTGGACTGAGTGGGGTG 2340 GCAAGCCAAG GATTGTGCGG GCCTTCATGG ATGGGACCAA TTGTATGACACTGGTAGACA 2400 AGGTGGGCCG GGCCAACGAC CTCACCATTG ATTATGCCGA CCAGCGACTGTACTGGACTG 2460 ACCTGGACAC CAACATGATT GAGTCTTCCA ACATGCTGGG TCAGGAGCGCATGGTGATAG 2520 CTGACGATCT GCCCTACCCG TTTGGCCTGA CTCAATATAG CGATTACATCTACTGGACTG 2580 ACTGGAACCT GCATAGCATT GAACGGGCGG ACAAGACCAG TGGGCGGAACCGCACCCTCA 2640 TCCAGGGTCA CCTGGACTTC GTCATGGACA TCCTGGTGTT CCACTCCTCCCGTCAGGATG 2700 GCCTCAACGA CTGCGTGCAC AGCAATGGCC AGTGTGGGCA GCTGTGCCTCGCCATCCCCG 2760 GAGGCCACCG CTGTGGCTGT GCTTCACACT ACACGCTGGA CCCCAGCAGCCGCAACTGCA 2820 GCCCGCCCTC CACCTTCTTG CTGTTCAGCC AGAAATTTGC CATCAGCCGGATGATCCCCG 2880 ATGACCAGCT CAGCCCGGAC CTTGTCCTAC CCCTTCATGG GCTGAGGAACGTCAAAGCCA 2940 TCAACTATGA CCCGCTGGAC AAGTTCATCT ACTGGGTGGA CGGGCGCCAGAACATCAAGA 3000 GGGCCAAGGA CGACGGTACC CAGCCCTCCA TGCTGACCTC TCCCAGCCAAAGCCTGAGCC 3060 CAGACAGACA GCCACACGAC CTCAGCATTG ACATCTACAG CCGGACACTGTTCTGGACCT 3120 GTGAGGCCAC CAACACTATC AATGTCCACC GGCTGGATGG GGATGCCATGGGAGTGGTGC 3180 TTCGAGGGGA CCGTGACAAG CCAAGGGCCA TTGCTGTCAA TGCTGAGCGAGGGTACATGT 3240 ACTTTACCAA CATGCAGGAC CATGCTGCCA AGATCGAGCG AGCCTCCCTGGATGGCACAG 3300 AGCGGGAGGT CCTCTTCACC ACAGGCCTCA TCCGTCCCGT GGCCCTTGTGGTGGACAATG 3360 CTCTGGGCAA GCTCTTCTGG GTGGATGCCG ACCTAAAGCG AATCGAAAGCTGTGACCTCT 3420 CTGGGGCCAA CCGCCTGACC CTGGAAGATG CCAACATCGT ACAGCCAGTAGGTCTGACAG 3480 TGCTGGGCAG GCACCTCTAC TGGATCGACC GCCAGCAGCA GATGATCGAGCGCGTGGAGA 3540 AGACCACTGG GGACAAGCGG ACTAGGGTTC AGGGCCGTGT CACCCACCTGACAGGCATCC 3600 ATGCCGTGGA GGAAGTCAGC CTGGAGGAGT TCTCAGCCCA TCCTTGTGCCCGAGACAATG 3660 GCGGCTGCTC CCACATCTGT ATCGCCAAGG GTGATGGAAC ACCGCGCTGCTCGTGCCCTG 3720 TCCACCTGGT GCTCCTGCAG AACCTGCTGA CTTGTGGTGA GCCTCCTACCTGCTCCCCTG 3780 ATCAGTTTGC ATGTACCACT GGTGAGATCG ACTGCATCCC CGGAGCCTGGCGCTGTGACG 3840 GCTTCCCTGA GTGTGCTGAC CAGAGTGATG AAGAAGGCTG CCCAGTGTGCTCCGCCTCTC 3900 AGTTCCCCTG CGCTCGAGGC CAGTGTGTGG ACCTGCGGTT ACGCTGCGACGGTGAGGCCG 3960 ACTGCCAGGA TCGCTCTGAT GAAGCTAACT GCGATGCTGT CTGTCTGCCCAATCAGTTCC 4020 GGTGCACCAG CGGCCAGTGT GTCCTCATCA AGCAACAGTG TGACTCCTTCCCCGACTGTG 4080 CTGATGGGTC TGATGACTCA TGTGTGAAAT CAACAAGCCA CCCTCTGATGACATCCCAGC 4140 CCACAGCAGT GCCATTGGGC CCGTCATTGG TATCATCCTC TCCCTCTTCGTCATGGGCGG 4200 GGTCTACTTT GTCTGCCAGC GTGTGATGTG CCAGCGCTAC ACAGGGGCCAGTGGGCCCTT 4260 TCCCCACGAG TATGTTGGTG GAGCCCCTCA TGTGCCTCTC AACTTCATAGCCCCAGGTGG 4320 CTCACAGCAC GGTCCCTTCC CAGGCATCCC GTGCAGCAAG TCCGTGATGAGCTCCATGAG 4380 CCTGGTGGGG GGGCGCGGCA GCGTGCCCCT CTATGACCGG AATCACGTCACTGGGGCCTC 4440 ATCCAGCAGC TCGTCCAGCA CAAAGGCCAC ACTATATCCG CCGATCCTGAACCCACCCCC 4500 GTCCCCGGCC ACAGACCCCT CTCTCTACAA CGTGGACGTG TTTTATTCTTCAGGCATCCC 4560 GGCCACCGCT AGACCATACA GGCCCTACGT CATTCGAGGT ATGGCACCCCCAACAACACC 4620 GTGCAGCACA GATGTGTGTG ACAGTGACTA CAGCATCAGT CGCTGGAAGAGCAGCAAATA 4680 CTACCTGGAC TTGAATTCGG ACTCAGACCC CTACCCCCCC CCGCCCACCCCCCACAGCCA 4740 GTACCTATCT GCAGAGGACA GCTGCCCACC CTCACCAGGC ACTGAGAGGAGTTACTGCCA 4800 CCTCTTCCCG CCCCCACCGT CCCCCTGCAC GGACTCGTCC TGA 48431614 amino acids amino acid linear 42 Met Glu Thr Ala Pro Thr Arg AlaPro Pro Pro Pro Pro Pro Pro Leu 1 5 10 15 Leu Leu Leu Val Leu Tyr CysSer Leu Val Pro Ala Ala Ala Ser Pro 20 25 30 Leu Leu Leu Phe Ala Asn ArgArg Asp Val Arg Leu Val Asp Ala Gly 35 40 45 Gly Val Lys Leu Glu Ser ThrIle Val Ala Ser Gly Leu Glu Asp Ala 50 55 60 Ala Ala Val Asp Phe Gln PheSer Lys Gly Ala Val Tyr Trp Thr Asp 65 70 75 80 Val Ser Glu Glu Ala IleLys Gln Thr Tyr Leu Asn Gln Thr Gly Ala 85 90 95 Ala Ala Gln Asn Ile ValIle Ser Gly Leu Val Ser Pro Asp Gly Leu 100 105 110 Ala Cys Asp Trp ValGly Lys Lys Leu Tyr Trp Thr Asp Ser Glu Thr 115 120 125 Asn Arg Ile GluVal Ala Asn Leu Asn Gly Thr Ser Arg Lys Val Leu 130 135 140 Phe Trp GlnAsp Leu Asp Gln Pro Arg Ala Ile Ala Leu Asp Pro Ala 145 150 155 160 HisGly Tyr Met Tyr Trp Thr Asp Trp Gly Glu Ala Pro Arg Ile Glu 165 170 175Arg Ala Gly Met Asp Gly Ser Thr Arg Lys Ile Ile Val Asp Ser Asp 180 185190 Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp Leu Glu Glu Gln Lys Leu 195200 205 Tyr Trp Ala Asp Ala Lys Leu Ser Phe Ile His Arg Ala Asn Leu Asp210 215 220 Gly Ser Phe Arg Gln Lys Val Val Glu Gly Ser Leu Thr His ProPhe 225 230 235 240 Ala Leu Thr Leu Ser Gly Asp Thr Leu Tyr Trp Thr AspTrp Gln Thr 245 250 255 Arg Ser Ile His Ala Cys Asn Lys Trp Thr Gly GluGln Arg Lys Glu 260 265 270 Ile Leu Ser Ala Leu Tyr Ser Pro Met Asp IleGln Val Leu Ser Gln 275 280 285 Glu Arg Gln Pro Pro Phe His Thr Pro CysGlu Glu Asp Asn Gly Gly 290 295 300 Cys Ser His Leu Cys Leu Leu Ser ProArg Glu Pro Phe Tyr Ser Cys 305 310 315 320 Ala Cys Pro Thr Gly Val GlnLeu Gln Asp Asn Gly Lys Thr Cys Lys 325 330 335 Thr Gly Ala Glu Glu ValLeu Leu Leu Ala Arg Arg Thr Asp Leu Arg 340 345 350 Arg Ile Ser Leu AspThr Pro Asp Phe Thr Asp Ile Val Leu Gln Val 355 360 365 Gly Asp Ile ArgHis Ala Ile Ala Ile Asp Tyr Asp Pro Leu Glu Gly 370 375 380 Tyr Val TyrTrp Thr Asp Asp Glu Val Arg Ala Ile Arg Arg Ala Tyr 385 390 395 400 LeuAsp Gly Ser Gly Ala Gln Thr Leu Val Asn Thr Glu Ile Asn Asp 405 410 415Pro Asp Gly Ile Ala Val Asp Trp Val Ala Arg Asn Leu Tyr Trp Thr 420 425430 Asp Thr Gly Thr Asp Arg Ile Glu Val Thr Arg Leu Asn Gly Thr Ser 435440 445 Arg Lys Ile Leu Val Ser Glu Asp Leu Asp Glu Pro Arg Ala Ile Val450 455 460 Leu His Pro Val Met Gly Leu Met Tyr Trp Thr Asp Trp Gly GluAsn 465 470 475 480 Pro Lys Ile Glu Cys Ala Asn Leu Asp Gly Arg Asp ArgHis Val Leu 485 490 495 Val Asn Thr Ser Leu Gly Trp Pro Asn Gly Leu AlaLeu Asp Leu Gln 500 505 510 Glu Gly Lys Leu Tyr Trp Gly Asp Ala Lys ThrAsp Lys Ile Glu Val 515 520 525 Ile Asn Ile Asp Gly Thr Lys Arg Lys ThrLeu Leu Glu Asp Lys Leu 530 535 540 Pro His Ile Phe Gly Phe Thr Leu LeuGly Asp Phe Ile Tyr Trp Thr 545 550 555 560 Asp Trp Gln Arg Arg Ser IleGlu Arg Val His Lys Val Lys Ala Ser 565 570 575 Arg Asp Val Ile Ile AspGln Leu Pro Asp Leu Met Gly Leu Lys Ala 580 585 590 Val Asn Val Ala LysVal Val Gly Thr Asn Pro Cys Ala Asp Gly Asn 595 600 605 Gly Gly Cys SerHis Leu Cys Phe Phe Thr Pro Arg Ala Thr Lys Cys 610 615 620 Gly Cys ProIle Gly Leu Glu Leu Leu Ser Asp Met Lys Thr Cys Ile 625 630 635 640 IlePro Glu Ala Phe Leu Val Phe Thr Ser Arg Ala Thr Ile His Arg 645 650 655Ile Ser Leu Glu Thr Asn Asn Asn Asp Val Ala Ile Pro Leu Thr Gly 660 665670 Val Lys Glu Ala Ser Ala Leu Asp Phe Asp Val Ser Asn Asn His Ile 675680 685 Tyr Trp Thr Asp Val Ser Leu Lys Thr Ile Ser Arg Ala Phe Met Asn690 695 700 Gly Ser Ser Val Glu His Val Ile Glu Phe Gly Leu Asp Tyr ProGlu 705 710 715 720 Gly Met Ala Val Asp Trp Met Gly Lys Asn Leu Tyr TrpAla Asp Thr 725 730 735 Gly Thr Asn Arg Ile Glu Val Ala Arg Leu Asp GlyGln Phe Arg Gln 740 745 750 Val Leu Val Trp Arg Asp Leu Asp Asn Pro ArgSer Leu Ala Leu Asp 755 760 765 Pro Thr Lys Gly Tyr Ile Tyr Trp Thr GluTrp Gly Gly Lys Pro Arg 770 775 780 Ile Val Arg Ala Phe Met Asp Gly ThrAsn Cys Met Thr Leu Val Asp 785 790 795 800 Lys Val Gly Arg Ala Asn AspLeu Thr Ile Asp Tyr Ala Asp Gln Arg 805 810 815 Leu Tyr Trp Thr Asp LeuAsp Thr Asn Met Ile Glu Ser Ser Asn Met 820 825 830 Leu Gly Gln Glu ArgMet Val Ile Ala Asp Asp Leu Pro Tyr Pro Phe 835 840 845 Gly Leu Thr GlnTyr Ser Asp Tyr Ile Tyr Trp Thr Asp Trp Asn Leu 850 855 860 His Ser IleGlu Arg Ala Asp Lys Thr Ser Gly Arg Asn Arg Thr Leu 865 870 875 880 IleGln Gly His Leu Asp Phe Val Met Asp Ile Leu Val Phe His Ser 885 890 895Ser Arg Gln Asp Gly Leu Asn Asp Cys Val His Ser Asn Gly Gln Cys 900 905910 Gly Gln Leu Cys Leu Ala Ile Pro Gly Gly His Arg Cys Gly Cys Ala 915920 925 Ser His Tyr Thr Leu Asp Pro Ser Ser Arg Asn Cys Ser Pro Pro Ser930 935 940 Thr Phe Leu Leu Phe Ser Gln Lys Phe Ala Ile Ser Arg Met IlePro 945 950 955 960 Asp Asp Gln Leu Ser Pro Asp Leu Val Leu Pro Leu HisGly Leu Arg 965 970 975 Asn Val Lys Ala Ile Asn Tyr Asp Pro Leu Asp LysPhe Ile Tyr Trp 980 985 990 Val Asp Gly Arg Gln Asn Ile Lys Arg Ala LysAsp Asp Gly Thr Gln 995 1000 1005 Pro Ser Met Leu Thr Ser Pro Ser GlnSer Leu Ser Pro Asp Arg Gln 1010 1015 1020 Pro His Asp Leu Ser Ile AspIle Tyr Ser Arg Thr Leu Phe Trp Thr 1025 1030 1035 1040 Cys Glu Ala ThrAsn Thr Ile Asn Val His Arg Leu Asp Gly Asp Ala 1045 1050 1055 Met GlyVal Val Leu Arg Gly Asp Arg Asp Lys Pro Arg Ala Ile Ala 1060 1065 1070Val Asn Ala Glu Arg Gly Tyr Met Tyr Phe Thr Asn Met Gln Asp His 10751080 1085 Ala Ala Lys Ile Glu Arg Ala Ser Leu Asp Gly Thr Glu Arg GluVal 1090 1095 1100 Leu Phe Thr Thr Gly Leu Ile Arg Pro Val Ala Leu ValVal Asp Asn 1105 1110 1115 1120 Ala Leu Gly Lys Leu Phe Trp Val Asp AlaAsp Leu Lys Arg Ile Glu 1125 1130 1135 Ser Cys Asp Leu Ser Gly Ala AsnArg Leu Thr Leu Glu Asp Ala Asn 1140 1145 1150 Ile Val Gln Pro Val GlyLeu Thr Val Leu Gly Arg His Leu Tyr Trp 1155 1160 1165 Ile Asp Arg GlnGln Gln Met Ile Glu Arg Val Glu Lys Thr Thr Gly 1170 1175 1180 Asp LysArg Thr Arg Val Gln Gly Arg Val Thr His Leu Thr Gly Ile 1185 1190 11951200 His Ala Val Glu Glu Val Ser Leu Glu Glu Phe Ser Ala His Pro Cys1205 1210 1215 Ala Arg Asp Asn Gly Gly Cys Ser His Ile Cys Ile Ala LysGly Asp 1220 1225 1230 Gly Thr Pro Arg Cys Ser Cys Pro Val His Leu ValLeu Leu Gln Asn 1235 1240 1245 Leu Leu Thr Cys Gly Glu Pro Pro Thr CysSer Pro Asp Gln Phe Ala 1250 1255 1260 Cys Thr Thr Gly Glu Ile Asp CysIle Pro Gly Ala Trp Arg Cys Asp 1265 1270 1275 1280 Gly Phe Pro Glu CysAla Asp Gln Ser Asp Glu Glu Gly Cys Pro Val 1285 1290 1295 Cys Ser AlaSer Gln Phe Pro Cys Ala Arg Gly Gln Cys Val Asp Leu 1300 1305 1310 ArgLeu Arg Cys Asp Gly Glu Ala Asp Cys Gln Asp Arg Ser Asp Glu 1315 13201325 Ala Asn Cys Asp Ala Val Cys Leu Pro Asn Gln Phe Arg Cys Thr Ser1330 1335 1340 Gly Gln Cys Val Leu Ile Lys Gln Gln Cys Asp Ser Phe ProAsp Cys 1345 1350 1355 1360 Ala Asp Gly Ser Asp Glu Leu Met Cys Glu IleAsn Lys Pro Pro Ser 1365 1370 1375 Asp Asp Ile Pro Ala His Ser Ser AlaIle Gly Pro Val Ile Gly Ile 1380 1385 1390 Ile Leu Ser Leu Phe Val MetGly Gly Val Tyr Phe Val Cys Gln Arg 1395 1400 1405 Val Met Cys Gln ArgTyr Thr Gly Ala Ser Gly Pro Phe Pro His Glu 1410 1415 1420 Tyr Val GlyGly Ala Pro His Val Pro Leu Asn Phe Ile Ala Pro Gly 1425 1430 1435 1440Gly Ser Gln His Gly Pro Phe Pro Gly Ile Pro Cys Ser Lys Ser Val 14451450 1455 Met Ser Ser Met Ser Leu Val Gly Gly Arg Gly Ser Val Pro LeuTyr 1460 1465 1470 Asp Arg Asn His Val Thr Gly Ala Ser Ser Ser Ser SerSer Ser Thr 1475 1480 1485 Lys Ala Thr Leu Tyr Pro Pro Ile Leu Asn ProPro Pro Ser Pro Ala 1490 1495 1500 Thr Asp Pro Ser Leu Tyr Asn Val AspVal Phe Tyr Ser Ser Gly Ile 1505 1510 1515 1520 Pro Ala Thr Ala Arg ProTyr Arg Pro Tyr Val Ile Arg Gly Met Ala 1525 1530 1535 Pro Pro Thr ThrPro Cys Ser Thr Asp Val Cys Asp Ser Asp Tyr Ser 1540 1545 1550 Ile SerArg Trp Lys Ser Ser Lys Tyr Tyr Leu Asp Leu Asn Ser Asp 1555 1560 1565Ser Asp Pro Tyr Pro Pro Pro Pro Thr Pro His Ser Gln Tyr Leu Ser 15701575 1580 Ala Glu Asp Ser Cys Pro Pro Ser Pro Gly Thr Glu Arg Ser TyrCys 1585 1590 1595 1600 His Leu Phe Pro Pro Pro Pro Ser Pro Cys Thr AspSer Ser 1605 1610 1591 amino acids amino acid linear 43 Cys Pro Ala ProAla Ala Ala Ser Pro Leu Leu Leu Phe Ala Asn Arg 1 5 10 15 Arg Asp ValArg Leu Val Asp Ala Gly Gly Val Lys Leu Glu Ser Thr 20 25 30 Ile Val ValSer Gly Leu Glu Asp Ala Ala Ala Val Asp Phe Gln Phe 35 40 45 Ser Lys GlyAla Val Tyr Trp Thr Asp Val Ser Glu Glu Ala Ile Lys 50 55 60 Gln Thr TyrLeu Asn Gln Thr Gly Ala Ala Val Gln Asn Val Val Ile 65 70 75 80 Ser GlyLeu Val Ser Pro Asp Gly Leu Ala Cys Asp Trp Val Gly Lys 85 90 95 Lys LeuTyr Trp Thr Asp Ser Glu Thr Asn Arg Ile Glu Val Ala Asn 100 105 110 LeuAsn Gly Thr Ser Arg Lys Val Leu Phe Trp Gln Asp Leu Asp Gln 115 120 125Pro Arg Ala Ile Ala Leu Asp Pro Ala His Gly Tyr Met Tyr Trp Thr 130 135140 Asp Trp Gly Glu Thr Pro Arg Ile Glu Arg Ala Gly Met Asp Gly Ser 145150 155 160 Thr Arg Lys Ile Ile Val Asp Ser Asp Ile Tyr Trp Pro Asn GlyLeu 165 170 175 Thr Ile Asp Leu Glu Glu Gln Lys Leu Tyr Trp Ala Asp AlaLys Leu 180 185 190 Ser Phe Ile His Arg Ala Asn Leu Asp Gly Ser Phe ArgGln Lys Val 195 200 205 Val Glu Gly Ser Leu Thr His Pro Phe Ala Leu ThrLeu Ser Gly Asp 210 215 220 Thr Leu Tyr Trp Thr Asp Trp Gln Thr Arg SerIle His Ala Cys Asn 225 230 235 240 Lys Arg Thr Gly Gly Lys Arg Lys GluIle Leu Ser Ala Leu Tyr Ser 245 250 255 Pro Met Asp Ile Gln Val Leu SerGln Glu Arg Gln Pro Phe Phe His 260 265 270 Thr Arg Cys Glu Glu Asp AsnGly Gly Cys Ser His Leu Cys Leu Leu 275 280 285 Ser Pro Ser Glu Pro PheTyr Thr Cys Ala Cys Pro Thr Gly Val Gln 290 295 300 Leu Gln Asp Asn GlyArg Thr Cys Lys Ala Gly Ala Glu Glu Val Leu 305 310 315 320 Leu Leu AlaArg Arg Thr Asp Leu Arg Arg Ile Ser Leu Asp Thr Pro 325 330 335 Asp PheThr Asp Ile Val Leu Gln Val Asp Asp Ile Arg His Ala Ile 340 345 350 AlaIle Asp Tyr Asp Pro Leu Glu Gly Tyr Val Tyr Trp Thr Asp Asp 355 360 365Glu Val Arg Ala Ile Arg Arg Ala Tyr Leu Asp Gly Ser Gly Ala Gln 370 375380 Thr Leu Val Asn Thr Glu Ile Asn Asp Pro Asp Gly Ile Ala Val Asp 385390 395 400 Trp Val Ala Arg Asn Leu Tyr Trp Thr Asp Thr Gly Thr Asp ArgIle 405 410 415 Glu Val Thr Arg Leu Asn Gly Thr Ser Arg Lys Ile Leu ValSer Glu 420 425 430 Asp Leu Asp Glu Pro Arg Ala Ile Ala Leu His Pro ValMet Gly Leu 435 440 445 Met Tyr Trp Thr Asp Trp Gly Glu Asn Pro Lys IleGlu Cys Ala Asn 450 455 460 Leu Asp Gly Gln Glu Arg Arg Val Leu Val AsnAla Ser Leu Gly Trp 465 470 475 480 Pro Asn Gly Leu Ala Leu Asp Leu GlnGlu Gly Lys Leu Tyr Trp Gly 485 490 495 Asp Ala Lys Thr Asp Lys Ile GluVal Ile Asn Val Asp Gly Thr Lys 500 505 510 Arg Arg Thr Leu Leu Glu AspLys Leu Pro His Ile Phe Gly Phe Thr 515 520 525 Leu Leu Gly Asp Phe IleTyr Trp Thr Asp Trp Gln Arg Arg Ser Ile 530 535 540 Glu Arg Val His LysVal Lys Ala Ser Arg Asp Val Ile Ile Asp Gln 545 550 555 560 Leu Pro AspLeu Met Gly Leu Lys Ala Val Asn Val Ala Lys Val Val 565 570 575 Gly ThrAsn Pro Cys Ala Asp Arg Asn Gly Gly Cys Ser His Leu Cys 580 585 590 PhePhe Thr Pro His Ala Thr Arg Cys Gly Cys Pro Ile Gly Leu Glu 595 600 605Leu Leu Ser Asp Met Lys Thr Cys Ile Val Pro Glu Ala Phe Leu Val 610 615620 Phe Thr Ser Arg Ala Ala Ile His Arg Ile Ser Leu Glu Thr Asn Asn 625630 635 640 Asn Asp Val Ala Ile Pro Leu Thr Gly Val Lys Glu Ala Ser AlaLeu 645 650 655 Asp Phe Asp Val Ser Asn Asn His Ile Tyr Trp Thr Asp ValSer Leu 660 665 670 Lys Thr Ile Ser Arg Ala Phe Met Asn Gly Ser Ser ValGlu His Val 675 680 685 Val Glu Phe Gly Leu Asp Tyr Pro Glu Gly Met AlaVal Asp Trp Met 690 695 700 Gly Lys Asn Leu Tyr Trp Ala Asp Thr Gly ThrAsn Arg Ile Glu Val 705 710 715 720 Ala Arg Leu Asp Gly Gln Phe Arg GlnVal Leu Val Trp Arg Asp Leu 725 730 735 Asp Asn Pro Arg Ser Leu Ala LeuAsp Pro Thr Lys Gly Tyr Ile Tyr 740 745 750 Trp Thr Glu Trp Gly Gly LysPro Arg Ile Val Arg Ala Phe Met Asp 755 760 765 Gly Thr Asn Cys Met ThrLeu Val Asp Lys Val Gly Arg Ala Asn Asp 770 775 780 Leu Thr Ile Asp TyrAla Asp Gln Arg Leu Tyr Trp Thr Asp Leu Asp 785 790 795 800 Thr Asn MetIle Glu Ser Ser Asn Met Leu Gly Gln Glu Arg Val Val 805 810 815 Ile AlaAsp Asp Leu Pro His Pro Phe Gly Leu Thr Gln Tyr Ser Asp 820 825 830 TyrIle Tyr Trp Thr Asp Trp Asn Leu His Ser Ile Glu Arg Ala Asp 835 840 845Lys Thr Ser Gly Arg Asn Arg Thr Leu Ile Gln Gly His Leu Asp Phe 850 855860 Val Met Asp Ile Leu Val Phe His Ser Ser Arg Gln Asp Gly Leu Asn 865870 875 880 Asp Cys Met His Asn Asn Gly Gln Cys Gly Gln Leu Cys Leu AlaIle 885 890 895 Pro Gly Gly His Arg Cys Gly Cys Ala Ser His Tyr Thr LeuAsp Pro 900 905 910 Ser Ser Arg Asn Cys Ser Pro Pro Thr Thr Phe Leu LeuPhe Ser Gln 915 920 925 Lys Ser Ala Ile Ser Arg Met Ile Pro Asp Asp GlnHis Ser Pro Asp 930 935 940 Leu Ile Leu Pro Leu His Gly Leu Arg Asn ValLys Ala Ile Asp Tyr 945 950 955 960 Asp Pro Leu Asp Lys Phe Ile Tyr TrpVal Asp Gly Arg Gln Asn Ile 965 970 975 Lys Arg Ala Lys Asp Asp Gly ThrGln Pro Phe Val Leu Thr Ser Leu 980 985 990 Ser Gln Gly Gln Asn Pro AspArg Gln Pro His Asp Leu Ser Ile Asp 995 1000 1005 Ile Tyr Ser Arg ThrLeu Phe Trp Thr Cys Glu Ala Thr Asn Thr Ile 1010 1015 1020 Asn Val HisArg Leu Ser Gly Glu Ala Met Gly Val Val Leu Arg Gly 1025 1030 1035 1040Asp Arg Asp Lys Pro Arg Ala Ile Val Val Asn Ala Glu Arg Gly Tyr 10451050 1055 Leu Tyr Phe Thr Asn Met Gln Asp Arg Ala Ala Lys Ile Glu ArgAla 1060 1065 1070 Ala Leu Asp Gly Thr Glu Arg Glu Val Leu Phe Thr ThrGly Leu Ile 1075 1080 1085 Arg Pro Val Ala Leu Val Val Asp Asn Thr LeuGly Lys Leu Phe Trp 1090 1095 1100 Val Asp Ala Asp Leu Lys Arg Ile GluSer Cys Asp Leu Ser Gly Ala 1105 1110 1115 1120 Asn Arg Leu Thr Leu GluAsp Ala Asn Ile Val Gln Pro Leu Gly Leu 1125 1130 1135 Thr Ile Leu GlyLys His Leu Tyr Trp Ile Asp Arg Gln Gln Gln Met 1140 1145 1150 Ile GluArg Val Glu Lys Thr Thr Gly Asp Lys Arg Thr Arg Ile Gln 1155 1160 1165Gly Arg Val Ala His Leu Thr Gly Ile His Ala Val Glu Glu Val Ser 11701175 1180 Leu Glu Glu Phe Ser Ala His Pro Cys Ala Arg Asp Asn Gly GlyCys 1185 1190 1195 1200 Ser His Ile Cys Ile Ala Lys Gly Asp Gly Thr ProArg Cys Ser Cys 1205 1210 1215 Pro Val His Leu Val Leu Leu Gln Asn LeuLeu Thr Cys Gly Glu Pro 1220 1225 1230 Pro Thr Cys Ser Pro Asp Gln PheAla Cys Ala Thr Gly Glu Ile Asp 1235 1240 1245 Cys Ile Pro Gly Ala TrpArg Cys Asp Gly Phe Pro Glu Cys Asp Asp 1250 1255 1260 Gln Ser Asp GluGlu Gly Cys Pro Val Cys Ser Ala Ala Gln Phe Pro 1265 1270 1275 1280 CysAla Arg Gly Gln Cys Val Asp Leu Arg Leu Arg Cys Asp Gly Glu 1285 12901295 Ala Asp Cys Gln Asp Arg Ser Asp Glu Ala Asp Cys Asp Ala Ile Cys1300 1305 1310 Leu Pro Asn Gln Phe Arg Cys Ala Ser Gly Gln Cys Val LeuIle Lys 1315 1320 1325 Gln Gln Cys Asp Ser Phe Pro Asp Cys Ile Asp GlySer Asp Glu Leu 1330 1335 1340 Met Cys Glu Ile Thr Lys Pro Pro Ser AspAsp Ser Pro Ala His Ser 1345 1350 1355 1360 Ser Ala Ile Gly Pro Val IleGly Ile Ile Leu Ser Leu Phe Val Met 1365 1370 1375 Gly Gly Val Tyr PheVal Cys Gln Arg Val Val Cys Gln Arg Tyr Ala 1380 1385 1390 Gly Ala AsnGly Pro Phe Pro His Glu Tyr Val Ser Gly Thr Pro His 1395 1400 1405 ValPro Leu Asn Phe Ile Ala Pro Gly Gly Ser Gln His Gly Pro Phe 1410 14151420 Thr Gly Ile Ala Cys Gly Lys Ser Met Met Ser Ser Val Ser Leu Met1425 1430 1435 1440 Gly Gly Arg Gly Gly Val Pro Leu Tyr Asp Arg Asn HisVal Thr Gly 1445 1450 1455 Ala Ser Ser Ser Ser Ser Ser Ser Thr Lys AlaThr Leu Tyr Pro Pro 1460 1465 1470 Ile Leu Asn Pro Pro Pro Ser Pro AlaThr Asp Pro Ser Leu Tyr Asn 1475 1480 1485 Met Asp Met Phe Tyr Ser SerAsn Ile Pro Ala Thr Val Arg Pro Tyr 1490 1495 1500 Arg Pro Tyr Ile IleArg Gly Met Ala Pro Pro Thr Thr Pro Cys Ser 1505 1510 1515 1520 Thr AspVal Cys Asp Ser Asp Tyr Ser Ala Ser Arg Trp Lys Ala Ser 1525 1530 1535Lys Tyr Tyr Leu Asp Leu Asn Ser Asp Ser Asp Pro Tyr Pro Pro Pro 15401545 1550 Pro Thr Pro His Ser Gln Tyr Leu Ser Ala Glu Asp Ser Cys ProPro 1555 1560 1565 Ser Pro Ala Thr Glu Arg Ser Tyr Phe His Leu Phe ProPro Pro Pro 1570 1575 1580 Ser Pro Cys Thr Asp Ser Ser 1585 1590 1586amino acids amino acid linear 44 Ala Ala Ser Pro Leu Leu Leu Phe Ala AsnArg Arg Asp Val Arg Leu 1 5 10 15 Val Asp Ala Gly Gly Val Lys Leu GluSer Thr Ile Val Ala Ser Gly 20 25 30 Leu Glu Asp Ala Ala Ala Val Asp PheGln Phe Ser Lys Gly Ala Val 35 40 45 Tyr Trp Thr Asp Val Ser Glu Glu AlaIle Lys Gln Thr Tyr Leu Asn 50 55 60 Gln Thr Gly Ala Ala Ala Gln Asn IleVal Ile Ser Gly Leu Val Ser 65 70 75 80 Pro Asp Gly Leu Ala Cys Asp TrpVal Gly Lys Lys Leu Tyr Trp Thr 85 90 95 Asp Ser Glu Thr Asn Arg Ile GluVal Ala Asn Leu Asn Gly Thr Ser 100 105 110 Arg Lys Val Leu Phe Trp GlnAsp Leu Asp Gln Pro Arg Ala Ile Ala 115 120 125 Leu Asp Pro Ala His GlyTyr Met Tyr Trp Thr Asp Trp Gly Glu Ala 130 135 140 Pro Arg Ile Glu ArgAla Gly Met Asp Gly Ser Thr Arg Lys Ile Ile 145 150 155 160 Val Asp SerAsp Ile Tyr Trp Pro Asn Gly Leu Thr Ile Asp Leu Glu 165 170 175 Glu GlnLys Leu Tyr Trp Ala Asp Ala Lys Leu Ser Phe Ile His Arg 180 185 190 AlaAsn Leu Asp Gly Ser Phe Arg Gln Lys Val Val Glu Gly Ser Leu 195 200 205Thr His Pro Phe Ala Leu Thr Leu Ser Gly Asp Thr Leu Tyr Trp Thr 210 215220 Asp Trp Gln Thr Arg Ser Ile His Ala Cys Asn Lys Trp Thr Gly Glu 225230 235 240 Gln Arg Lys Glu Ile Leu Ser Ala Leu Tyr Ser Pro Met Asp IleGln 245 250 255 Val Leu Ser Gln Glu Arg Gln Pro Pro Phe His Thr Pro CysGlu Glu 260 265 270 Asp Asn Gly Gly Cys Ser His Leu Cys Leu Leu Ser ProArg Glu Pro 275 280 285 Phe Tyr Ser Cys Ala Cys Pro Thr Gly Val Gln LeuGln Asp Asn Gly 290 295 300 Lys Thr Cys Lys Thr Gly Ala Glu Glu Val LeuLeu Leu Ala Arg Arg 305 310 315 320 Thr Asp Leu Arg Arg Ile Ser Leu AspThr Pro Asp Phe Thr Asp Ile 325 330 335 Val Leu Gln Val Gly Asp Ile ArgHis Ala Ile Ala Ile Asp Tyr Asp 340 345 350 Pro Leu Glu Gly Tyr Val TyrTrp Thr Asp Asp Glu Val Arg Ala Ile 355 360 365 Arg Arg Ala Tyr Leu AspGly Ser Gly Ala Gln Thr Leu Val Asn Thr 370 375 380 Glu Ile Asn Asp ProAsp Gly Ile Ala Val Asp Trp Val Ala Arg Asn 385 390 395 400 Leu Tyr TrpThr Asp Thr Gly Thr Asp Arg Ile Glu Val Thr Arg Leu 405 410 415 Asn GlyThr Ser Arg Lys Ile Leu Val Ser Glu Asp Leu Asp Glu Pro 420 425 430 ArgAla Ile Val Leu His Pro Val Met Gly Leu Met Tyr Trp Thr Asp 435 440 445Trp Gly Glu Asn Pro Lys Ile Glu Cys Ala Asn Leu Asp Gly Arg Asp 450 455460 Arg His Val Leu Val Asn Thr Ser Leu Gly Trp Pro Asn Gly Leu Ala 465470 475 480 Leu Asp Leu Gln Glu Gly Lys Leu Tyr Trp Gly Asp Ala Lys ThrAsp 485 490 495 Lys Ile Glu Val Ile Asn Ile Asp Gly Thr Lys Arg Lys ThrLeu Leu 500 505 510 Glu Asp Lys Leu Pro His Ile Phe Gly Phe Thr Leu LeuGly Asp Phe 515 520 525 Ile Tyr Trp Thr Asp Trp Gln Arg Arg Ser Ile GluArg Val His Lys 530 535 540 Val Lys Ala Ser Arg Asp Val Ile Ile Asp GlnLeu Pro Asp Leu Met 545 550 555 560 Gly Leu Lys Ala Val Asn Val Ala LysVal Val Gly Thr Asn Pro Cys 565 570 575 Ala Asp Gly Asn Gly Gly Cys SerHis Leu Cys Phe Phe Thr Pro Arg 580 585 590 Ala Thr Lys Cys Gly Cys ProIle Gly Leu Glu Leu Leu Ser Asp Met 595 600 605 Lys Thr Cys Ile Ile ProGlu Ala Phe Leu Val Phe Thr Ser Arg Ala 610 615 620 Thr Ile His Arg IleSer Leu Glu Thr Asn Asn Asn Asp Val Ala Ile 625 630 635 640 Pro Leu ThrGly Val Lys Glu Ala Ser Ala Leu Asp Phe Asp Val Ser 645 650 655 Asn AsnHis Ile Tyr Trp Thr Asp Val Ser Leu Lys Thr Ile Ser Arg 660 665 670 AlaPhe Met Asn Gly Ser Ser Val Glu His Val Ile Glu Phe Gly Leu 675 680 685Asp Tyr Pro Glu Gly Met Ala Val Asp Trp Met Gly Lys Asn Leu Tyr 690 695700 Trp Ala Asp Thr Gly Thr Asn Arg Ile Glu Val Ala Arg Leu Asp Gly 705710 715 720 Gln Phe Arg Gln Val Leu Val Trp Arg Asp Leu Asp Asn Pro ArgSer 725 730 735 Leu Ala Leu Asp Pro Thr Lys Gly Tyr Ile Tyr Trp Thr GluTrp Gly 740 745 750 Gly Lys Pro Arg Ile Val Arg Ala Phe Met Asp Gly ThrAsn Cys Met 755 760 765 Thr Leu Val Asp Lys Val Gly Arg Ala Asn Asp LeuThr Ile Asp Tyr 770 775 780 Ala Asp Gln Arg Leu Tyr Trp Thr Asp Leu AspThr Asn Met Ile Glu 785 790 795 800 Ser Ser Asn Met Leu Gly Gln Glu ArgMet Val Ile Ala Asp Asp Leu 805 810 815 Pro Tyr Pro Phe Gly Leu Thr GlnTyr Ser Asp Tyr Ile Tyr Trp Thr 820 825 830 Asp Trp Asn Leu His Ser IleGlu Arg Ala Asp Lys Thr Ser Gly Arg 835 840 845 Asn Arg Thr Leu Ile GlnGly His Leu Asp Phe Val Met Asp Ile Leu 850 855 860 Val Phe His Ser SerArg Gln Asp Gly Leu Asn Asp Cys Val His Ser 865 870 875 880 Asn Gly GlnCys Gly Gln Leu Cys Leu Ala Ile Pro Gly Gly His Arg 885 890 895 Cys GlyCys Ala Ser His Tyr Thr Leu Asp Pro Ser Ser Arg Asn Cys 900 905 910 SerPro Pro Ser Thr Phe Leu Leu Phe Ser Gln Lys Phe Ala Ile Ser 915 920 925Arg Met Ile Pro Asp Asp Gln Leu Ser Pro Asp Leu Val Leu Pro Leu 930 935940 His Gly Leu Arg Asn Val Lys Ala Ile Asn Tyr Asp Pro Leu Asp Lys 945950 955 960 Phe Ile Tyr Trp Val Asp Gly Arg Gln Asn Ile Lys Arg Ala LysAsp 965 970 975 Asp Gly Thr Gln Pro Ser Met Leu Thr Ser Pro Ser Gln SerLeu Ser 980 985 990 Pro Asp Arg Gln Pro His Asp Leu Ser Ile Asp Ile TyrSer Arg Thr 995 1000 1005 Leu Phe Trp Thr Cys Glu Ala Thr Asn Thr IleAsn Val His Arg Leu 1010 1015 1020 Asp Gly Asp Ala Met Gly Val Val LeuArg Gly Asp Arg Asp Lys Pro 1025 1030 1035 1040 Arg Ala Ile Ala Val AsnAla Glu Arg Gly Tyr Met Tyr Phe Thr Asn 1045 1050 1055 Met Gln Asp HisAla Ala Lys Ile Glu Arg Ala Ser Leu Asp Gly Thr 1060 1065 1070 Glu ArgGlu Val Leu Phe Thr Thr Gly Leu Ile Arg Pro Val Ala Leu 1075 1080 1085Val Val Asp Asn Ala Leu Gly Lys Leu Phe Trp Val Asp Ala Asp Leu 10901095 1100 Lys Arg Ile Glu Ser Cys Asp Leu Ser Gly Ala Asn Arg Leu ThrLeu 1105 1110 1115 1120 Glu Asp Ala Asn Ile Val Gln Pro Val Gly Leu ThrVal Leu Gly Arg 1125 1130 1135 His Leu Tyr Trp Ile Asp Arg Gln Gln GlnMet Ile Glu Arg Val Glu 1140 1145 1150 Lys Thr Thr Gly Asp Lys Arg ThrArg Val Gln Gly Arg Val Thr His 1155 1160 1165 Leu Thr Gly Ile His AlaVal Glu Glu Val Ser Leu Glu Glu Phe Ser 1170 1175 1180 Ala His Pro CysAla Arg Asp Asn Gly Gly Cys Ser His Ile Cys Ile 1185 1190 1195 1200 AlaLys Gly Asp Gly Thr Pro Arg Cys Ser Cys Pro Val His Leu Val 1205 12101215 Leu Leu Gln Asn Leu Leu Thr Cys Gly Glu Pro Pro Thr Cys Ser Pro1220 1225 1230 Asp Gln Phe Ala Cys Thr Thr Gly Glu Ile Asp Cys Ile ProGly Ala 1235 1240 1245 Trp Arg Cys Asp Gly Phe Pro Glu Cys Ala Asp GlnSer Asp Glu Glu 1250 1255 1260 Gly Cys Pro Val Cys Ser Ala Ser Gln PhePro Cys Ala Arg Gly Gln 1265 1270 1275 1280 Cys Val Asp Leu Arg Leu ArgCys Asp Gly Glu Ala Asp Cys Gln Asp 1285 1290 1295 Arg Ser Asp Glu AlaAsn Cys Asp Ala Val Cys Leu Pro Asn Gln Phe 1300 1305 1310 Arg Cys ThrSer Gly Gln Cys Val Leu Ile Lys Gln Gln Cys Asp Ser 1315 1320 1325 PhePro Asp Cys Ala Asp Gly Ser Asp Glu Leu Met Cys Glu Ile Asn 1330 13351340 Lys Pro Pro Ser Asp Asp Ile Pro Ala His Ser Ser Ala Ile Gly Pro1345 1350 1355 1360 Val Ile Gly Ile Ile Leu Ser Leu Phe Val Met Gly GlyVal Tyr Phe 1365 1370 1375 Val Cys Gln Arg Val Met Cys Gln Arg Tyr ThrGly Ala Ser Gly Pro 1380 1385 1390 Phe Pro His Glu Tyr Val Gly Gly AlaPro His Val Pro Leu Asn Phe 1395 1400 1405 Ile Ala Pro Gly Gly Ser GlnHis Gly Pro Phe Pro Gly Ile Pro Cys 1410 1415 1420 Ser Lys Ser Val MetSer Ser Met Ser Leu Val Gly Gly Arg Gly Ser 1425 1430 1435 1440 Val ProLeu Tyr Asp Arg Asn His Val Thr Gly Ala Ser Ser Ser Ser 1445 1450 1455Ser Ser Ser Thr Lys Ala Thr Leu Tyr Pro Pro Ile Leu Asn Pro Pro 14601465 1470 Pro Ser Pro Ala Thr Asp Pro Ser Leu Tyr Asn Val Asp Val PheTyr 1475 1480 1485 Ser Ser Gly Ile Pro Ala Thr Ala Arg Pro Tyr Arg ProTyr Val Ile 1490 1495 1500 Arg Gly Met Ala Pro Pro Thr Thr Pro Cys SerThr Asp Val Cys Asp 1505 1510 1515 1520 Ser Asp Tyr Ser Ile Ser Arg TrpLys Ser Ser Lys Tyr Tyr Leu Asp 1525 1530 1535 Leu Asn Ser Asp Ser AspPro Tyr Pro Pro Pro Pro Thr Pro His Ser 1540 1545 1550 Gln Tyr Leu SerAla Glu Asp Ser Cys Pro Pro Ser Pro Gly Thr Glu 1555 1560 1565 Arg SerTyr Cys His Leu Phe Pro Pro Pro Pro Ser Pro Cys Thr Asp 1570 1575 1580Ser Ser 1585 4 amino acids amino acid linear 45 Asn Pro Xaa Tyr 1 4amino acids amino acid linear 46 Tyr Trp Thr Asp 1 4 amino acids aminoacid linear 47 Asn Gly Gly Cys 1 4 amino acids amino acid linear 48 ValPro Leu Tyr 1 17 base pairs nucleic acid single linear 49 ATGGAGCCCGAGTGAGC 17 20 base pairs nucleic acid single linear 50 ATGGTGGACTCCAGCTTGAC 20 19 base pairs nucleic acid single linear 51 TTCCAGTTTTCCAAGGGAG 19 20 base pairs nucleic acid single linear 52 AAAACTGGAAGTCCACTGCG 20 18 base pairs nucleic acid single linear 53 GGTCTGCTTGATGGCCTC 18 19 base pairs nucleic acid single linear 54 GTGCAGAACGTGGTCATCT 19 20 base pairs nucleic acid single linear 55 AGTCCACAATGATCTTCCGG 20 20 base pairs nucleic acid single linear 56 CCAATGGACTGACCATCGAC 20 20 base pairs nucleic acid single linear 57 GTCGATGGTCAGTCCATTGG 20 19 base pairs nucleic acid single linear 58 TTGTCCTCCTCACAGCGAG 19 20 base pairs nucleic acid single linear 59 GGACTTCATCTACTGGACTG 20 20 base pairs nucleic acid single linear 60 CAGTCTGTCCAGTACATGAG 20 20 base pairs nucleic acid single linear 61 GCCTTCTTGGTCTTCACCAG 20 20 base pairs nucleic acid single linear 62 GGACCAACAGAATCGAAGTG 20 17 base pairs nucleic acid single linear 63 GTCAATGGTGAGGTCGT 17 20 base pairs nucleic acid single linear 64 ACACCAACATGATCGAGTCG 20 20 base pairs nucleic acid single linear 65 ACAAGTTCATCTACTGGGTG 20 20 base pairs nucleic acid single linear 66 CGGACACTGTTCTGGACGTG 20 20 base pairs nucleic acid single linear 67 CACGTCCAGAACAGTGTCCG 20 20 base pairs nucleic acid single linear 68 TCCAGTAGAGATGCTTGCCA 20 20 base pairs nucleic acid single linear 69 ATCGAGCGTGTGGAGAAGAC 20 20 base pairs nucleic acid single linear 70 TCCTCATCAAACAGCAGTGC 20 19 base pairs nucleic acid single linear 71 CGGCTTGGTGATTTCACAC 19 21 base pairs nucleic acid single linear 72 GTGTGTGACAGCGACTACAG C 21 21 base pairs nucleic acid single linear 73 GCTGTAGTCGCTGTCACACA C 21 20 base pairs nucleic acid single linear 74 GTACAAAGTTCTCCCAGCCC 20 20 base pairs nucleic acid single linear 75 TCTTCTCCAGAGGATGCAGC 20 20 base pairs nucleic acid single linear 76 TTCGTCTTGAACTTCCCAGC 20 21 base pairs nucleic acid single linear 77 TCTTCTTCTCCAGAGGATGC A 21 20 base pairs nucleic acid single linear 78 AGGCTGGTCTCAAACTCCTG 20 20 base pairs nucleic acid single linear 79 GGGGATGTGCTGCAAGGCGA 20 22 base pairs nucleic acid single linear 80 CCAGGGTTTTCCCAGTCACG AC 22 25 base pairs nucleic acid single linear 81 TTGTGTGGAATTGTGAGCGG ATAAC 25 25 base pairs nucleic acid single linear 82CCCAGGCTTT ACACTTTATG CTTCC 25 20 base pairs nucleic acid single linear83 CAGGGTTTCA TCCTTTGTGG 20 38 base pairs nucleic acid single linear 84TGTAAAACGA CGGCCAGTCA GGGTTTCATC CTTTGTGG 38 40 base pairs nucleic acidsingle linear 85 GCTATGACCA TGATTACGCC CAGGGTTTCA TCCTTTGTGG 40 20 basepairs nucleic acid single linear 86 TGACGGGAAG AGTTCCTCAG 20 40 basepairs nucleic acid single linear 87 GCTATGACCA TGATTACGCC TGACGGGAAGAGTTCCTCAG 40 20 base pairs nucleic acid single linear 88 TCTGCTCTTCCTGAACTGCC 20 38 base pairs nucleic acid single linear 89 TGTAAAACGACGGCCAGTTC TGCTCTTCCT GAACTGCC 38 20 base pairs nucleic acid singlelinear 90 TTGAGTCCTT CAACAAGCCC 20 40 base pairs nucleic acid singlelinear 91 GCTATGACCA TGATTACGCC TTGAGTCCTT CAACAAGCCC 40 38 base pairsnucleic acid single linear 92 TGTAAAACGA CGGCCAGTTT CCCCACTCAT AGAGGCTC38 38 base pairs nucleic acid single linear 93 GCTATGACCA TGATTACGCCGCTCCCAACT CGCCAAGT 38 36 base pairs nucleic acid single linear 94TGTAAAACGA CGGCCAGTGG TCAACATGGA GGCAGC 36 38 base pairs nucleic acidsingle linear 95 GCTATGACCA TGATTACGCC CAGGTGTCAG TCCGCTTG 38 35 basepairs nucleic acid single linear 96 TGTAAAACGA CGGCCAGTGC AGAGAAGTTCTGAGC 35 39 base pairs nucleic acid single linear 97 GCTATGACCATGATTACGCC CACTTGGCCA GCCATACTC 39 38 base pairs nucleic acid singlelinear 98 TGTAAAACGA CGGCCAGTCA AGCAAGCCTC TTGCTACC 38 40 base pairsnucleic acid single linear 99 GCTATGACCA TGATTACGCC ACTGCAATGAGGTGAAAGGC 40 38 base pairs nucleic acid single linear 100 TGTAAAACGACGGCCAGTCA GGTGAGAACA AGTGTCCG 38 38 base pairs nucleic acid singlelinear 101 GCTATGACCA TGATTACGCC GCTGCCTCCA TGTTGACC 38 37 base pairsnucleic acid single linear 102 TGTAAAACGA CGGCCAGTTG TGCCTGGGTG AGATTCT37 40 base pairs nucleic acid single linear 103 GCTATGACCA TGATTACGCCTGTGGAGCCT CTATGAGTGG 40 37 base pairs nucleic acid single linear 104TGTAAAACGA CGGCCAGTGG GTGACAGGTG GCAGTAG 37 40 base pairs nucleic acidsingle linear 105 GCTATGACCA TGATTACGCC GGAAGGAAGG ACACTTGAGC 40 38 basepairs nucleic acid single linear 106 TGTAAAACGA CGGCCAGTCC TGGTGTGTTTGAGAACCC 38 39 base pairs nucleic acid single linear 107 GCTATGACCATGATTACGCC CAATGGGAAG CCAGGCTAG 39 20 base pairs nucleic acid singlelinear 108 ATCTTGCTGG CTTAGCCAGT 20 38 base pairs nucleic acid singlelinear 109 TGTAAAACGA CGGCCAGTAT CTTGCTGGCT TAGCCAGT 38 40 base pairsnucleic acid single linear 110 GCTATGACCA TGATTACGCC ATCTTGCTGGCTTAGCCAGT 40 21 base pairs nucleic acid single linear 111 GCTCATGCAAATTCGAGAGA G 21 41 base pairs nucleic acid single linear 112 GCTATGACCATGATTACGCC GCTCATGCAA ATTCGAGAGA G 41 21 base pairs nucleic acid singlelinear 113 CCTGTTGGTT ATTTCCGATG G 21 39 base pairs nucleic acid singlelinear 114 TGTAAAACGA CGGCCAGTCC TGTTGGTTAT TTCCGATGG 39 41 base pairsnucleic acid single linear 115 GCTATGACCA TGATTACGCC CCTGTTGGTTATTTCCGATG G 41 21 base pairs nucleic acid single linear 116 CCTGAGTTAAGAAGGAACGC C 21 41 base pairs nucleic acid single linear 117 GCTATGACCATGATTACGCC CCTGAGTTAA GAAGGAACGC C 41 19 base pairs nucleic acid singlelinear 118 AATTGGGTCA GCAGCAATG 19 39 base pairs nucleic acid singlelinear 119 GCTATGACCA TGATTACGCC AATTGGGTCA GCAGCAATG 39 19 base pairsnucleic acid single linear 120 AATTGGGTCA GCAGCAATG 19 37 base pairsnucleic acid single linear 121 TGTAAAACGA CGGCCAGTAA TTGGGTCAGC AGCAATG37 20 base pairs nucleic acid single linear 122 TTGGATCGCT AGAGATTGGG 2040 base pairs nucleic acid single linear 123 GCTATGACCA TGATTACGCCTTGGATCGCT AGAGATTGGG 40 19 base pairs nucleic acid single linear 124GCACCCTAAT TGGCACTCA 19 39 base pairs nucleic acid single linear 125GCTATGACCA TGATTACGCC GCACCCTAAT TGGCACTCA 39 20 base pairs nucleic acidsingle linear 126 TGACGGTCCT CTTCTGGAAC 20 40 base pairs nucleic acidsingle linear 127 GCTATGACCA TGATTACGCC TGACGGTCCT CTTCTGGAAC 40 20 basepairs nucleic acid single linear 128 CGAGGCAGGA TGTGACTCAT 20 38 basepairs nucleic acid single linear 129 TGTAAAACGA CGGCCAGTCG AGGCAGGATGTGACTCAT 38 40 base pairs nucleic acid single linear 130 GCTATGACCATGATTACGCC CGAGGCAGGA TGTGACTCAT 40 19 base pairs nucleic acid singlelinear 131 AGTGGATCAT TTCGAACGG 19 39 base pairs nucleic acid singlelinear 132 GCTATGACCA TGATTACGCC AGTGGATCAT TTCGAACGG 39 20 base pairsnucleic acid single linear 133 CCAACTCAGC TTCCCGAGTA 20 40 base pairsnucleic acid single linear 134 GCTATGACCA TGATTACGCC CCAACTCAGCTTCCCGAGTA 40 20 base pairs nucleic acid single linear 135 TGGCTGAGTATTTCCCTTGC 20 38 base pairs nucleic acid single linear 136 TGTAAAACGACGGCCAGTTG GCTGAGTATT TCCCTTGC 38 40 base pairs nucleic acid singlelinear 137 GCTATGACCA TGATTACGCC TGGCTGAGTA TTTCCCTTGC 40 19 base pairsnucleic acid single linear 138 TTTAACAAGC CCTCCTCCG 19 39 base pairsnucleic acid single linear 139 GCTATGACCA TGATTACGCC TTTAACAAGCCCTCCTCCG 39 19 base pairs nucleic acid single linear 140 CAACGCCAGCATCTACTGA 19 37 base pairs nucleic acid single linear 141 TGTAAAACGACGGCCAGTCA ACGCCAGCAT CTACTGA 37 39 base pairs nucleic acid singlelinear 142 GCTATGACCA TGATTACGCC CAACGCCAGC ATCTACTGA 39 20 base pairsnucleic acid single linear 143 CAAATAGCAG AGCACAGGCA 20 40 base pairsnucleic acid single linear 144 GCTATGACCA TGATTACGCC CAAATAGCAGAGCACAGGCA 40 19 base pairs nucleic acid single linear 145 TGAAGTTGCTGCTCTTGGG 19 37 base pairs nucleic acid single linear 146 TGTAAAACGACGGCCAGTTG AAGTTGCTGC TCTTGGG 37 39 base pairs nucleic acid singlelinear 147 GCTATGACCA TGATTACGCC TGAAGTTGCT GCTCTTGGG 39 21 base pairsnucleic acid single linear 148 CACTTCCTCC TCATGCAAGT C 21 41 base pairsnucleic acid single linear 149 GCTATGACCA TGATTACGCC CACTTCCTCCTCATGCAAGT C 41 21 base pairs nucleic acid single linear 150 AGACTGGAGCCTCTGTGTTC G 21 39 base pairs nucleic acid single linear 151 TGTAAAACGACGGCCAGTAG ACTGGAGCCT CTGTGTTCG 39 41 base pairs nucleic acid singlelinear 152 GCTATGACCA TGATTACGCC AGACTGGAGC CTCTGTGTTC G 41 20 basepairs nucleic acid single linear 153 TGTGTGTCTA CCGGACTTGC 20 40 basepairs nucleic acid single linear 154 GCTATGACCA TGATTACGCC TGTGTGTCTACCGGACTTGC 40 21 base pairs nucleic acid single linear 155 GAACAGAGGCAAGGTTTTCC C 21 41 base pairs nucleic acid single linear 156 GCTATGACCATGATTACGCC GAACAGAGGC AAGGTTTTCC C 41 19 base pairs nucleic acid singlelinear 157 AGAATCGCTT GAACCCAGG 19 39 base pairs nucleic acid singlelinear 158 GCTATGACCA TGATTACGCC AGAATCGCTT GAACCCAGG 39 20 base pairsnucleic acid single linear 159 GCTGGTTCCT AAAATGTGGC 20 38 base pairsnucleic acid single linear 160 TGTAAAACGA CGGCCAGTGC TGGTTCCTAA AATGTGGC38 40 base pairs nucleic acid single linear 161 GCTATGACCA TGATTACGCCGCTGGTTCCT AAAATGTGGC 40 22 base pairs nucleic acid single linear 162CATACGAGGT GAACACAAGG AC 22 42 base pairs nucleic acid single linear 163GCTATGACCA TGATTACGCC CATACGAGGT GAACACAAGG AC 42 20 base pairs nucleicacid single linear 164 TGAAGAGGTG GGGACAGTTG 20 40 base pairs nucleicacid single linear 165 GCTATGACCA TGATTACGCC TGAAGAGGTG GGGACAGTTG 40 21base pairs nucleic acid single linear 166 CTTGTGCCTT CCAGCTACAT C 21 39base pairs nucleic acid single linear 167 TGTAAAACGA CGGCCAGTCTTGTGCCTTCC AGCTACATC 39 41 base pairs nucleic acid single linear 168GCTATGACCA TGATTACGCC CTTGTGCCTT CCAGCTACAT C 41 20 base pairs nucleicacid single linear 169 AGTCCTGGCA CAGGGATTAG 20 40 base pairs nucleicacid single linear 170 GCTATGACCA TGATTACGCC AGTCCTGGCA CAGGGATTAG 40 20base pairs nucleic acid single linear 171 ATAACTGCAG CAAAGGCACC 20 40base pairs nucleic acid single linear 172 GCTATGACCA TGATTACGCCATAACTGCAG CAAAGGCACC 40 20 base pairs nucleic acid single linear 173GCTTCAGTGG ATCTTGCTGG 20 38 base pairs nucleic acid single linear 174TGTAAAACGA CGGCCAGTGC TTCAGTGGAT CTTGCTGG 38 40 base pairs nucleic acidsingle linear 175 GCTATGACCA TGATTACGCC GCTTCAGTGG ATCTTGCTGG 40 20 basepairs nucleic acid single linear 176 TGTGCAGTGC ACAACCTACC 20 40 basepairs nucleic acid single linear 177 GCTATGACCA TGATTACGCC TGTGCAGTGCACAACCTACC 40 20 base pairs nucleic acid single linear 178 GTTGTCGAGTGGCGTGCTAT 20 38 base pairs nucleic acid single linear 179 TGTAAAACGACGGCCAGTGT TGTCGAGTGG CGTGCTAT 38 40 base pairs nucleic acid singlelinear 180 GCTATGACCA TGATTACGCC GTTGTCGAGT GGCGTGCTAT 40 20 base pairsnucleic acid single linear 181 AAAAGTCCTG TGGGGTCTGA 20 40 base pairsnucleic acid single linear 182 GCTATGACCA TGATTACGCC AAAAGTCCTGTGGGGTCTGA 40 20 base pairs nucleic acid single linear 183 AGAAGTGTGGCCTCTGCTGT 20 38 base pairs nucleic acid single linear 184 TGTAAAACGACGGCCAGTAG AAGTGTGGCC TCTGCTGT 38 40 base pairs nucleic acid singlelinear 185 GCTATGACCA TGATTACGCC AGAAGTGTGG CCTCTGCTGT 40 21 base pairsnucleic acid single linear 186 GTGAAAGAGC CTGTGTTTGC T 21 41 base pairsnucleic acid single linear 187 GCTATGACCA TGATTACGCC GTGAAAGAGCCTGTGTTTGC T 41 21 base pairs nucleic acid single linear 188 AGACCCTGCTTCCAAATAAG C 21 39 base pairs nucleic acid single linear 189 TGTAAAACGACGGCCAGTAG ACCCTGCTTC CAAATAAGC 39 41 base pairs nucleic acid singlelinear 190 GCTATGACCA TGATTACGCC AGACCCTGCT TCCAAATAAG C 41 20 basepairs nucleic acid single linear 191 ACTCATTTTC TGCCTCTGCC 20 40 basepairs nucleic acid single linear 192 GCTATGACCA TGATTACGCC ACTCATTTTCTGCCTCTGCC 40 20 base pairs nucleic acid single linear 193 TGGCAGTCCTGTCAACCTCT 20 38 base pairs nucleic acid single linear 194 TGTAAAACGACGGCCAGTTG GCAGTCCTGT CAACCTCT 38 40 base pairs nucleic acid singlelinear 195 GCTATGACCA TGATTACGCC TGGCAGTCCT GTCAACCTCT 40 20 base pairsnucleic acid single linear 196 CACACAGGAT CTTGCACTGG 20 40 base pairsnucleic acid single linear 197 GCTATGACCA TGATTACGCC CACACAGGATCTTGCACTGG 40 20 base pairs nucleic acid single linear 198 AGGGCCAGTTCTCATGAGTT 20 38 base pairs nucleic acid single linear 199 TGTAAAACGACGGCCAGTAG GGCCAGTTCT CATGAGTT 38 40 base pairs nucleic acid singlelinear 200 GCTATGACCA TGATTACGCC AGGGCCAGTT CTCATGAGTT 40 20 base pairsnucleic acid single linear 201 GGGCAAAGGA AGACACAATC 20 40 base pairsnucleic acid single linear 202 GCTATGACCA TGATTACGCC GGGCAAAGGAAGACACAATC 40 20 base pairs nucleic acid single linear 203 CAACTTCTGCTTTGAAGCCC 20 38 base pairs nucleic acid single linear 204 TGTAAAACGACGGCCAGTCA ACTTCTGCTT TGAAGCCC 38 40 base pairs nucleic acid singlelinear 205 GCTATGACCA TGATTACGCC CAACTTCTGC TTTGAAGCCC 40 20 base pairsnucleic acid single linear 206 GACAGACTTG GCAATCTCCC 20 40 base pairsnucleic acid single linear 207 GCTATGACCA TGATTACGCC GACAGACTTGGCAATCTCCC 40 21 base pairs nucleic acid single linear 208 TCTGCTCTCTGTTTGGAGTC C 21 39 base pairs nucleic acid single linear 209 TGTAAAACGACGGCCAGTTC TGCTCTCTGT TTGGAGTCC 39 41 base pairs nucleic acid singlelinear 210 GCTATGACCA TGATTACGCC TCTGCTCTCT GTTTGGAGTC C 41 20 basepairs nucleic acid single linear 211 CCCTAAACTC CACGTTCCTG 20 40 basepairs nucleic acid single linear 212 GCTATGACCA TGATTACGCC CCCTAAACTCCACGTTCCTG 40 20 base pairs nucleic acid single linear 213 GGGTTAATGTTGGCCACATC 20 40 base pairs nucleic acid single linear 214 GCTATGACCATGATTACGCC GGGTTAATGT TGGCCACATC 40 19 base pairs nucleic acid singlelinear 215 TTGGCAGGGA TGTGTTGAG 19 37 base pairs nucleic acid singlelinear 216 TGTAAAACGA CGGCCAGTTT GGCAGGGATG TGTTGAG 37 39 base pairsnucleic acid single linear 217 GCTATGACCA TGATTACGCC TTGGCAGGGATGTGTTGAG 39 20 base pairs nucleic acid single linear 218 GTCTGCCACATGTGCAAGAG 20 40 base pairs nucleic acid single linear 219 GCTATGACCATGATTACGCC GTCTGCCACA TGTGCAAGAG 40 20 base pairs nucleic acid singlelinear 220 TGGTCTGAGT CTCGTGGGTA 20 38 base pairs nucleic acid singlelinear 221 TGTAAAACGA CGGCCAGTTG GTCTGAGTCT CGTGGGTA 38 40 base pairsnucleic acid single linear 222 GCTATGACCA TGATTACGCC TGGTCTGAGTCTCGTGGGTA 40 21 base pairs nucleic acid single linear 223 GAGGTGGATTTGGGTGAGAT T 21 41 base pairs nucleic acid single linear 224 GCTATGACCATGATTACGCC GAGGTGGATT TGGGTGAGAT T 41 20 base pairs nucleic acid singlelinear 225 AGCCCTCTCT GCAAGGAAAG 20 38 base pairs nucleic acid singlelinear 226 TGTAAAACGA CGGCCAGTAG CCCTCTCTGC AAGGAAAG 38 40 base pairsnucleic acid single linear 227 GCTATGACCA TGATTACGCC AGCCCTCTCTGCAAGGAAAG 40 20 base pairs nucleic acid single linear 228 CAGAACGTGGAGTTCTGCTG 20 40 base pairs nucleic acid single linear 229 GCTATGACCATGATTACGCC CAGAACGTGG AGTTCTGCTG 40 20 base pairs nucleic acid singlelinear 230 TACCGAATCC CACTCCTCTG 20 38 base pairs nucleic acid singlelinear 231 TGTAAAACGA CGGCCAGTTA CCGAATCCCA CTCCTCTG 38 40 base pairsnucleic acid single linear 232 GCTATGACCA TGATTACGCC TACCGAATCCCACTCCTCTG 40 20 base pairs nucleic acid single linear 233 CATGGTAGAGGTGGGACCAT 20 38 base pairs nucleic acid single linear 234 TGTAAAACGACGGCCAGTCA TGGTAGAGGT GGGACCAT 38 40 base pairs nucleic acid singlelinear 235 GCTATGACCA TGATTACGCC CATGGTAGAG GTGGGACCAT 40 20 base pairsnucleic acid single linear 236 GATATCCACC TCTGCCCAAG 20 40 base pairsnucleic acid single linear 237 GCTATGACCA TGATTACGCC GATATCCACCTCTGCCCAAG 40 20 base pairs nucleic acid single linear 238 TTACAGGGGCACAGAGAAGC 20 40 base pairs nucleic acid single linear 239 GCTATGACCATGATTACGCC TTACAGGGGC ACAGAGAAGC 40 20 base pairs nucleic acid singlelinear 240 GCAACAGAGC AAGACCCTGT 20 40 base pairs nucleic acid singlelinear 241 GCTATGACCA TGATTACGCC GCAACAGAGC AAGACCCTGT 40 19 base pairsnucleic acid single linear 242 AAATTAGCCA GGCATGGTG 19 39 base pairsnucleic acid single linear 243 GCTATGACCA TGATTACGCC AAATTAGCCAGGCATGGTG 39 38 base pairs nucleic acid single linear 244 TGTAAAACGACGGCCAGTGC AACAGAGCAA GACCCTGT 38 20 base pairs nucleic acid singlelinear 245 CCTGCAGAAG GAAACCTGAC 20 40 base pairs nucleic acid singlelinear 246 GCTATGACCA TGATTACGCC CCTGCAGAAG GAAACCTGAC 40 19 base pairsnucleic acid single linear 247 CTGCATCTTT GCCACCATG 19 39 base pairsnucleic acid single linear 248 GCTATGACCA TGATTACGCC CTGCATCTTTGCCACCATG 39 38 base pairs nucleic acid single linear 249 TGTAAAACGACGGCCAGTCC TGCAGAAGGA AACCTGAC 38 20 base pairs nucleic acid singlelinear 250 TTCCCAGGAG GCAAGTTATG 20 40 base pairs nucleic acid singlelinear 251 GCTATGACCA TGATTACGCC TTCCCAGGAG GCAAGTTATG 40 20 base pairsnucleic acid single linear 252 TGGGCTTAGG TGATCCTCAC 20 40 base pairsnucleic acid single linear 253 GCTATGACCA TGATTACGCC TGGGCTTAGGTGATCCTCAC 40 38 base pairs nucleic acid single linear 254 TGTAAAACGACGGCCAGTTT CCCAGGAGGC AAGTTATG 38 20 base pairs nucleic acid singlelinear 255 ACCAAGCCCA ACTAATCAGC 20 40 base pairs nucleic acid singlelinear 256 GCTATGACCA TGATTACGCC ACCAAGCCCA ACTAATCAGC 40 20 base pairsnucleic acid single linear 257 ATGCCTGTAA TCCCAGCACT 20 40 base pairsnucleic acid single linear 258 GCTATGACCA TGATTACGCC ATGCCTGTAATCCCAGCACT 40 38 base pairs nucleic acid single linear 259 TGTAAAACGACGGCCAGTAC CAAGCCCAAC TAATCAGC 38 20 base pairs nucleic acid singlelinear 260 ACTGCAAGCC CTCTCTGAAC 20 20 base pairs nucleic acid singlelinear 261 CGAAGACTGC GAAACAGACA 20 20 base pairs nucleic acid singlelinear 262 CTAGTGCCGT GCAGAATGAG 20 20 base pairs nucleic acid singlelinear 263 GGCCACTGCA ATGAGATACA 20 20 base pairs nucleic acid singlelinear 264 GAGAAACAGT TCCAGGGTGG 20 40 base pairs nucleic acid singlelinear 265 GCTATGACCA TGATTACGCC GAGAAACAGT TCCAGGGTGG 40 20 base pairsnucleic acid single linear 266 AAACTGAGGC TGGGAGAGGT 20 40 base pairsnucleic acid single linear 267 GCTATGACCA TGATTACGCC AAACTGAGGCTGGGAGAGGT 40 20 base pairs nucleic acid single linear 268 TGTTCTTCCTCACAGGGAGG 20 40 base pairs nucleic acid single linear 269 GCTATGACCATGATTACGCC TGTTCTTCCT CACAGGGAGG 40 20 base pairs nucleic acid singlelinear 270 TCCCCAAATC TGTCCAGTTC 20 40 base pairs nucleic acid singlelinear 271 GCTATGACCA TGATTACGCC TCCCCAAATC TGTCCAGTTC 40 20 base pairsnucleic acid single linear 272 CATACCTGGA GGGATGCTTG 20 40 base pairsnucleic acid single linear 273 GCTATGACCA TGATTACGCC CATACCTGGAGGGATGCTTG 40 20 base pairs nucleic acid single linear 274 TAGGTTGCTGTGTGGCTTCA 20 40 base pairs nucleic acid single linear 275 GCTATGACCATGATTACGCC TAGGTTGCTG TGTGGCTTCA 40 20 base pairs nucleic acid singlelinear 276 CTTCTGACAA AGCAGAGGCC 20 40 base pairs nucleic acid singlelinear 277 GCTATGACCA TGATTACGCC CTTCTGACAA AGCAGAGGCC 40 20 base pairsnucleic acid single linear 278 GCTGTTAGGG TTACCATCGC 20 40 base pairsnucleic acid single linear 279 GCTATGACCA TGATTACGCC GCTGTTAGGGTTACCATCGC 40 20 base pairs nucleic acid single linear 280 CCACAGGGTGATATGCTGTC 20 40 base pairs nucleic acid single linear 281 GCTATGACCATGATTACGCC CCACAGGGTG ATATGCTGTC 40 20 base pairs nucleic acid singlelinear 282 CGCCTGGCTA CTTTGGTACT 20 40 base pairs nucleic acid singlelinear 283 GCTATGACCA TGATTACGCC CGCCTGGCTA CTTTGGTACT 40 19 base pairsnucleic acid single linear 284 CCAAATGAAC CTGGGCAAC 19 39 base pairsnucleic acid single linear 285 GCTATGACCA TGATTACGCC CCAAATGAACCTGGGCAAC 39 20 base pairs nucleic acid single linear 286 GTCTTGGCTCACTGCAACCT 20 40 base pairs nucleic acid single linear 287 GCTATGACCATGATTACGCC GTCTTGGCTC ACTGCAACCT 40 20 base pairs nucleic acid singlelinear 288 GCCAAGACTG TGCTACTGCA 20 20 base pairs nucleic acid singlelinear 289 CAGGGAGCAG ATCTTACCCA 20 20 base pairs nucleic acid singlelinear 290 TGGGATTAAC TAGGGAGGGG 20 40 base pairs nucleic acid singlelinear 291 GCTATGACCA TGATTACGCC TGGGATTAAC TAGGGAGGGG 40 20 base pairsnucleic acid single linear 292 TGCTGCTGTC TCCATCTCTG 20 40 base pairsnucleic acid single linear 293 GCTATGACCA TGATTACGCC TGCTGCTGTCTCCATCTCTG 40 21 base pairs nucleic acid single linear 294 ACAGACCAGCAGTGAAACCT G 21 41 base pairs nucleic acid single linear 295 GCTATGACCATGATTACGCC ACAGACCAGC AGTGAAACCT G 41 20 base pairs nucleic acid singlelinear 296 GTTCACTGCA ACCTCTGCCT 20 40 base pairs nucleic acid singlelinear 297 GCTATGACCA TGATTACGCC GTTCACTGCA ACCTCTGCCT 40 21 base pairsnucleic acid single linear 298 GTTCTCGTAG ATGCTTGCAG G 21 41 base pairsnucleic acid single linear 299 GCTATGACCA TGATTACGCC GTTCTCGTAGATGCTTGCAG G 41 20 base pairs nucleic acid single linear 300 GAGGCAGGAGGATCACTTGA 20 40 base pairs nucleic acid single linear 301 GCTATGACCATGATTACGCC GAGGCAGGAG GATCACTTGA 40 20 base pairs nucleic acid singlelinear 302 TGAGCTGAGA TCACACCGCT 20 40 base pairs nucleic acid singlelinear 303 GCTATGACCA TGATTACGCC TGAGCTGAGA TCACACCGCT 40 20 base pairsnucleic acid single linear 304 AGTTGACACT TTGCTGGCCT 20 40 base pairsnucleic acid single linear 305 GCTATGACCA TGATTACGCC AGTTGACACTTTGCTGGCCT 40 20 base pairs nucleic acid single linear 306 CTCTGCATGGCTTAGGGACA 20 40 base pairs nucleic acid single linear 307 GCTATGACCATGATTACGCC CTCTGCATGG CTTAGGGACA 40 20 base pairs nucleic acid singlelinear 308 GGCTGCTCTC TGCATTCTCT 20 40 base pairs nucleic acid singlelinear 309 GCTATGACCA TGATTACGCC GGCTGCTCTC TGCATTCTCT 40 21 base pairsnucleic acid single linear 310 CTGGCTTTAG CTTGCATTTC C 21 41 base pairsnucleic acid single linear 311 GCTATGACCA TGATTACGCC CTGGCTTTAGCTTGCATTTC C 41 21 base pairs nucleic acid single linear 312 TGCCTCAGTTTTCTCACCTG T 21 41 base pairs nucleic acid single linear 313 GCTATGACCATGATTACGCC TGCCTCAGTT TTCTCACCTG T 41 20 base pairs nucleic acid singlelinear 314 CAAACAGCCA CTGAGCATGT 20 40 base pairs nucleic acid singlelinear 315 GCTATGACCA TGATTACGCC CAAACAGCCA CTGAGCATGT 40 20 base pairsnucleic acid single linear 316 TCCTCCTGTA GATGCCCAAG 20 40 base pairsnucleic acid single linear 317 GCTATGACCA TGATTACGCC TCCTCCTGTAGATGCCCAAG 40 22 base pairs nucleic acid single linear 318 GCCGAGAATTGTCATCTTAA CT 22 22 base pairs nucleic acid single linear 319 GGATTGAAAGCTGCAAACTA CA 22 20 base pairs nucleic acid single linear 320 GGAGCCACCACATCCAGTTA 20 18 base pairs nucleic acid single linear 321 TGGAGGGATTGCTTGAGG 18 20 base pairs nucleic acid single linear 322 AGGTGTACACCACCATGCCT 20 19 base pairs nucleic acid single linear 323 TGGTGCCAATTATTGCTGC 19 22 base pairs nucleic acid single linear 324 AGATCTTATACACATGTGCG CG 22 21 base pairs nucleic acid single linear 325 AGGTGACATCACTTACAGCG G 21 18 base pairs nucleic acid single linear 326 ATTACCCAGGCATGGTGC 18 20 base pairs nucleic acid single linear 327 CAGGCACTTCTTCCAGGTCT 20 20 base pairs nucleic acid single linear 328 AGGGTTACACTGGAGTTTGC 20 25 base pairs nucleic acid single linear 329 AAACCTTCAATGTGTTCATT AAAAC 25 20 base pairs nucleic acid single linear 330TCAACTTTAT TGGGGGTTTA 20 20 base pairs nucleic acid single linear 331AAGGTAAAAG TCCAAAATGG 20 21 base pairs nucleic acid single linear 332GGACAGTCAG TTATTGAAAT G 21 20 base pairs nucleic acid single linear 333TTTCCTCTCT GGGAGTCTCT 20 20 base pairs nucleic acid single linear 334TCAAGCTGGA GTCCACCATC 20 19 base pairs nucleic acid single linear 335CACTCGCTGT GAGGAGGAC 19 20 base pairs nucleic acid single linear 336ACAACGGCAG GACGTGTAAG 20 19 base pairs nucleic acid single linear 337ATTGCCATCG ACTACGACC 19 20 base pairs nucleic acid single linear 338TGGTCAACAC CGAGATCAAC 20 20 base pairs nucleic acid single linear 339AACCTCTACT GGACCGACAC 20 19 base pairs nucleic acid single linear 340CTCATGTACT GGACAGACT 19 20 base pairs nucleic acid single linear 341GAGACGCCAA GACAGACAAG 20 20 base pairs nucleic acid single linear 342CAGTCCAGTA GATGAAGTCC 20 20 base pairs nucleic acid single linear 343GTGAAGAAGC ACAGGTGGCT 20 20 base pairs nucleic acid single linear 344TCATGTCACT CAGCAGCTCC 20 20 base pairs nucleic acid single linear 345CCGTTGTTGT GCATACAGTC 20 20 base pairs nucleic acid single linear 346GTGGCACATG CAAACTGGTC 20 28 base pairs nucleic acid single linear 347GCTCTAGAGT ACAAAGTTCT CCCAGCCC 28 54 base pairs nucleic acid singlelinear 348 ATCCTCGGGG TCTTCCGGGG CGAGTTCTGG CTGGCTACTG CTGTGGGCCG GGCT54 54 base pairs nucleic acid single linear 349 TGGATATCTC AGTGGTGGTGGTGGTGGTGC TCGACATCCT CGGGGTCTTC CGGG 54 35 base pairs nucleic acidsingle linear 350 TAGAATTCGC CGCCACCATG GAGGCAGCGC CGCCC 35 17 basepairs nucleic acid single linear 351 GAGGCGGGAG CAAGAGG 17 26 base pairsnucleic acid single linear 352 GCAAGCTTCA TGGAGCCCGA GTGAGC 26 17 basepairs nucleic acid single linear 353 ATGGAGCCCG AGTGAGC 17 17 base pairsnucleic acid single linear 354 TCACTCGGGC TCCATGG 17 20 base pairsnucleic acid single linear 355 TGCTGTACTG CAGCTTGGTC 20 21 base pairsnucleic acid single linear 356 ATGCAGCTGC TGTAGACTTC C 21 20 base pairsnucleic acid single linear 357 GTCTGTTTGA TGGCCTCCTC 20 20 base pairsnucleic acid single linear 358 ATGTTCTGTG CAGCACCTCC 20 18 base pairsnucleic acid single linear 359 GCCATCAGGT GACACGAG 18 21 base pairsnucleic acid single linear 360 AAGGTTCTCT TCTGGCAGGA C 21 19 base pairsnucleic acid single linear 361 CCAGTCAGTC CAGTACATG 19 20 base pairsnucleic acid single linear 362 TCGACCTGGA GGAACAGAAG 20 20 base pairsnucleic acid single linear 363 AAGCTCAGCT TCATCCACCG 20 20 base pairsnucleic acid single linear 364 ATGAAGCTGA GCTTGGCATC 20 22 base pairsnucleic acid single linear 365 AGCAGAGGAA GGAGATCCTT AG 22 20 base pairsnucleic acid single linear 366 TCCATGGGTG AGTACAGAGC 20 20 base pairsnucleic acid single linear 367 ATTGTCCTGC AACTGCACAC 20 19 base pairsnucleic acid single linear 368 GCCATTGCCA TTGACTACG 19 21 base pairsnucleic acid single linear 369 GGATCGTAGT CAATGGCAAT G 21 20 base pairsnucleic acid single linear 370 GAATTGAGGT GACTCGCCTC 20 20 base pairsnucleic acid single linear 371 CCTCAATTCT GTAGTGCCTG 20 19 base pairsnucleic acid single linear 372 TGTGTTGCAC CCTGTGATG 19 19 base pairsnucleic acid single linear 373 ATCTAGGTTG GCGCATTCG 19 19 base pairsnucleic acid single linear 374 AGGTGTTCAC CAGGACATG 19 29 base pairsnucleic acid single linear 375 GCGAGCTCCC GTCTATGTTG ATCACCTCG 29 20base pairs nucleic acid single linear 376 GACCTGATGG GACTCAAAGC 20 20base pairs nucleic acid single linear 377 GCTGGTGAAT ACCAGGAAGG 20 20base pairs nucleic acid single linear 378 ACGATGTGGC TATCCCACTC 20 20base pairs nucleic acid single linear 379 AGTAGGATCC AGAGCCAGAG 20 20base pairs nucleic acid single linear 380 AGCGCATGGT GATAGCTGAC 20 21base pairs nucleic acid single linear 381 CGTTCAATGC TATGCAGGTT C 21 20base pairs nucleic acid single linear 382 GTGCTTCACA CTACACGCTG 20 19base pairs nucleic acid single linear 383 CAGCCAGAAA TTTGCCATC 19 20base pairs nucleic acid single linear 384 TCCGGCTGTA GATGTCAATG 20 21base pairs nucleic acid single linear 385 AGGCCACCAA CACTATCAAT G 21 20base pairs nucleic acid single linear 386 TACCCTCGCT CAGCATTGAC 20 19base pairs nucleic acid single linear 387 CTGGAAGATG CCAACATCG 19 20base pairs nucleic acid single linear 388 TGAACCCTAG TCCGCTTGTC 20 20base pairs nucleic acid single linear 389 CTGCAGAACC TGCTGACTTG 20 21base pairs nucleic acid single linear 390 CCAGAGTGAT GAAGAAGGCT G 21 20base pairs nucleic acid single linear 391 TCACTCTGGT CAGCACACTC 20 20base pairs nucleic acid single linear 392 CAGGATCGCT CTGATGAAGC 20 21base pairs nucleic acid single linear 393 GCAGTTAGCT TCATCAGAGC G 21 20base pairs nucleic acid single linear 394 ACCCTCTGAT GACATCCCAG 20 18base pairs nucleic acid single linear 395 AATGGCACTG CTGTGGGC 18 20 basepairs nucleic acid single linear 396 AGGCTCATGG AGCTCATCAC 20 20 basepairs nucleic acid single linear 397 ATAGTGTGGC CTTTGTGCTG 20 20 basepairs nucleic acid single linear 398 GTCATTCGAG GTATGGCACC 20 21 basepairs nucleic acid single linear 399 GGTAGTATTT GCTGCTCTTC C 21 27 basepairs nucleic acid single linear 400 GCTCTAGAAA AGTTTCCCAG CCCTGCC 27 19base pairs nucleic acid single linear 401 CTGGAAGATG CCAACATCG 19 62base pairs nucleic acid single linear 402 GCTCTAGACT AGTGATGGTGATGGTGATGA CTGCTGTGGG CTGGGATGTC ATCAGAGGGT 60 GG 62 17 amino acidsamino acid linear 403 Ser Tyr Phe His Leu Phe Pro Pro Pro Pro Ser ProCys Thr Asp Ser 1 5 10 15 Ser 15 amino acids amino acid linear 404 ValAsp Gly Arg Gln Asn Ile Lys Arg Ala Lys Asp Asp Gly Thr 1 5 10 15 18amino acids amino acid linear 405 Glu Val Leu Phe Thr Thr Gly Leu IleArg Pro Val Ala Leu Val Val 1 5 10 15 Asp Asn 16 amino acids amino acidlinear 406 Ile Gln Gly His Leu Asp Phe Val Met Asp Ile Leu Val Phe HisSer 1 5 10 15 27 base pairs nucleic acid single linear 407 CCATCCTAATACGACTCACT ATAGGGC 27 23 base pairs nucleic acid single linear 408ACTCACTATA GGGCTCGAGC GGC 23 18 base pairs nucleic acid single linear409 TGTAAAACGA CGGCCAGT 18 20 base pairs nucleic acid single linear 410GCTATGACCA TGATTACGCC 20 16 base pairs nucleic acid double linear 411CCGGGTCAAC ATGGAG 16 16 base pairs nucleic acid double linear 412CCGCGGGTAG GTGGGC 16 16 base pairs nucleic acid double linear 413TGCCCCACAG CCTCGC 16 16 base pairs nucleic acid double linear 414TCACGGGTAA ACCCTG 16 16 base pairs nucleic acid double linear 415CCCGTCACAG GTACAT 16 16 base pairs nucleic acid double linear 416GTTCCGGTAG GTACCC 16 16 base pairs nucleic acid double linear 417CTGACTGCAG GCAGAA 16 16 base pairs nucleic acid double linear 418CTTTCTGTGA GTGCCG 16 16 base pairs nucleic acid double linear 419GTTTTCCCAG TCCACA 16 16 base pairs nucleic acid double linear 420AGGCAGGTGA GGCGGT 16 16 base pairs nucleic acid double linear 421GTCTCCACAG GAGCCG 16 16 base pairs nucleic acid double linear 422GATGGGGTAA GACGGG 16 16 base pairs nucleic acid double linear 423TCTTCTCCAG CCTCAT 16 16 base pairs nucleic acid double linear 424ATCGAGGTGA GGCTCC 16 16 base pairs nucleic acid double linear 425CGTCCTGCAG GTGATC 16 16 base pairs nucleic acid double linear 426TCGTCGGTGA GTCCGG 16 16 base pairs nucleic acid double linear 427TCGCTTCCAG GAACCA 16 16 base pairs nucleic acid double linear 428CTGAAGGTAG CGTGGG 16 16 base pairs nucleic acid double linear 429CTGCTGCCAG ACCATC 16 16 base pairs nucleic acid double linear 430CAAGGGGTAA GTGTTT 16 16 base pairs nucleic acid double linear 431TGCCTTCCAG CTACAT 16 16 base pairs nucleic acid double linear 432TGCTGGGTGA GGGCCG 16 16 base pairs nucleic acid double linear 433GTTCATGCAG GTCAGG 16 16 base pairs nucleic acid double linear 434GCAGCCGTAA GTGCCT 16 16 base pairs nucleic acid double linear 435CCTCCTCTAG CGCCCA 16 16 base pairs nucleic acid double linear 436ACCCAGGCAG GTGCCC 16 16 base pairs nucleic acid double linear 437TGTCTTACAG CCCTTT 16 16 base pairs nucleic acid double linear 438GCGAGGGTAG GAGGCC 16 16 base pairs nucleic acid double linear 439CCTCCCGCAG GTACCT 16 16 base pairs nucleic acid double linear 440TGTCAGGTAA GGGGCC 16 16 base pairs nucleic acid double linear 441CTGCTTGCAG GGGCCA 16 16 base pairs nucleic acid double linear 442AGTTCTGTAC GTGGGG 16 16 base pairs nucleic acid double linear 443GTCTTTGCAG CAGCCC 16 16 base pairs nucleic acid double linear 444GTGGAGGTAG GTGTGA 16 16 base pairs nucleic acid double linear 445CCTCCCCCAG AGCCGC 16 16 base pairs nucleic acid double linear 446GTGACGGTGA GGCCCT 16 16 base pairs nucleic acid double linear 447TCCCTTGCAG CCATCT 16 16 base pairs nucleic acid double linear 448TGTGTGGTGA GCCAGC 16 16 base pairs nucleic acid double linear 449TCTCTGGCAG AAATCA 16 16 base pairs nucleic acid double linear 450TCACAGGTAA GGAGCC 16 16 base pairs nucleic acid double linear 451TCCCTGCCAG GCATCG 16 16 base pairs nucleic acid double linear 452CCGCCGGTGA GGGGCG 16 16 base pairs nucleic acid double linear 453CTCTCCTCAG ATCCTG 16 16 base pairs nucleic acid double linear 454GTACAGGTAG GACATC 16 16 base pairs nucleic acid double linear 455TCCCTTTCAG GCCCTA 16

What is claimed is:
 1. An isolated polypeptide comprising the amino acidsequence shown in FIG. 5(e) (SEQ ID NO:4).
 2. A polypeptide comprisingthe amino acid sequence shown in FIG. 5(c) (SEQ ID NO:3).
 3. A fragmentof a polypeptide consisting of at least 5 contiguous amino acids of anamino acid sequence selected from the amino acid sequences of FIG. 5(c)(SEQ ID NO:3) and FIG. 5(e) (SEQ ID NO:4).
 4. A fragment according toclaim 3 which has an amino acid sequence selected from:SYFHLFPPPPSPCTDSS (SEQ ID NO:403), VDGRQNIKRAKDDGT (SEQ ID NO:404),EVLFTTGLIRPVALVVDN (SEQ ID NO:405), and IQGHLDFVMDILVFHS (SEQ IDNO:406).
 5. A fragment according to claim 3 which comprises the LRP5extracellular domain.
 6. A fragment according to claim 3 which comprisesthe LRP5 cytoplasmic domain.
 7. A recombinant method of producing thepolypeptide of claim 1 comprising expressing the polypeptide of SEQ IDNO:
 4. 8. A method according to claim 7 further comprising isolatingand/or purifying the polypeptide.
 9. A method according to claim 7further comprising formulating the polypeptide into a composition whichincludes at least one additional component.
 10. A composition comprisingthe polypeptide of claim 1 and a physiologically acceptable excipient.11. A recombinant method of producing a polypeptide fragment of claim 3comprising expressing the fragment of a polypeptide consisting of atleast 5 contiguous amino acids selected from the group consisting of SEQID NO: 3 and SEQ ID NO:
 4. 12. A method according to claim 11 furthercomprising isolating and/or purifying the polypeptide.
 13. A methodaccording to claim 11 further comprising formulating the polypeptideinto a composition which includes at least one additional component. 14.A composition comprising the fragment polypeptide of claim 3 and aphysiologically acceptable excipient.